“How many citations will I have to screen?”

I get asked this question a lot at the beginning stages of a review project. It’s a fair question: researchers want to know what their screening workload is going to look like, and screening abstracts is a tedious process.

There’s some great research happening right now on how to make screening less tedious – using text mining to automate or semi-automate the process, for example. These are promising approaches, but for now, most reviews need a human eye to oversee study selection for at least part of the process.

I used to get flustered when I was asked this question because I was afraid of giving the wrong answer and underestimating the amount of work a researcher would later have to do. On the other hand, if I overestimated the number of citations to screen, a researcher might want to change the search strategy to lower the number of citations or otherwise change the methodology.

I also didn’t have a very good answer. It’s hard to estimate the number of citations to be screened without downloading them from all sources and de-duplicating first. I’ve sometimes estimated the total number of included studies at the end of the project by screening a random sample (such as 100 citations) and calculating the ratio of the sample to the total number of citations to be screened (e.g. if there is one relevant article in a random sample of 100, and 1000 articles total to screen, I would estimate about 10 studies to be included at the end of the project – for more on getting a truly random sample of citations, see my previous blog post on this topic).

Tired of giving bad answers to this question, I’ve crunched the numbers for a few of the review projects I’ve worked on in the last year. For each of the projects, I found the number of citations downloaded from the Ovid MEDLINE search only, then the number of citations left to screen at the title/abstract stage after citations were downloaded from all databases and duplicates removed.

The results are below:Capture

The ratio of MEDLINE search only to total citations to screen ranged from 239% to 1333%. However, the last two columns represent projects that had less of a biomedical focus (social sciences and computer science, respectively). The MEDLINE searches were still relevant in these cases, but we didn’t expect the majority of our studies to come from MEDLINE. Thus, if we exclude the major outlier, with more of a social sciences focus, the results are a little more consistent.

For each of the projects above, I searched 6-7 databases, except project 1, where I searched 14(?!). However, the ratio of citations for project 1 is not exceptionally different than that of the other projects. For now, I can’t see a discernible difference between screening burden and number of databases searched, but possibly more data is needed.

My overall take-away from this exercise is that, for the searches that I run, the screening burden of a systematic review tends to be about 2.5x to 5x that of the original MEDLINE search. In the future, this is the advice that I’ll be giving my researchers to help them better plan their resources and time. I can breathe a small sigh of relief, too, knowing that the information that I give my researchers is just a little more evidence-based than it was the day before.


Workbook for systematic review consultations

I’m often approached by masters and PhD students and researchers in my institution to advise on systematic review projects in the early stages. I’ve found that the skill levels to complete a systematic or scoping review are variable, and that many researchers need a primer to get up to speed about the process of conducting a review, what skills are required, and in particular, how to go about the planning process.

I support many projects in depth from start to finish, but for many projects at my institution, I only have the time to provide advice and consultations. Unfortunately, I quickly learned that throwing a lot of information at people in a short period of time was not useful, and I would sometimes see the same researchers at a later consultation who hadn’t gotten very far with their projects and needed a lot of the same information again.

notebookPhoto by Glenn Carstens-Peters on Unsplash

There are many, many resources online for conducting review projects, including some enviable LibGuides (I personally like the Queens University knowledge synthesis guide and University of Toronto learning to search guide). However, I wanted a resource that I could use when physically sitting with someone in a meeting room, where we could plan out their review project together. And I was getting pretty tired of drawing the same venn diagrams of how “AND” and “OR” boolean operators work on whatever scratch paper I had handy.

I recently developed a guide that fits these purposes, and after a few iterations and some testing and feedback, I’ve put it online for others to use and edit as they wish with a CC-NC-SA 4.0 License. The goal of this guide is to provide a resource that:

  • Can be printed and used as a workbook to guide a systematic reviews consultation
  • Also contains enough information to be a stand-alone self-learning resource for after the consultation (e.g. the information on boolean operators)
  • Is not too long to be intimidating or overwhelming for someone just getting started

Without a doubt, there will be further refinements and additions to the guide over time, but for now, please feel free to download, use, and edit for your own purposes. Any feedback or comments are also gratefully accepted. 🙂

You can find the guide here at Open Science Framework.

screenshot of OSF

Building a Twitter bot!

I have long admired – and, I’ll admit – been a bit fearful of cool technology projects that make use of APIs. To be honest, I’m still not *entirely* sure how an API works. It feels a bit like magic. You need keys and secret keys and bits of code and all those things need to be in the right place at the right time and I might even have to use scary things like the command line!

So you can imagine, I’ve been looking at all the cool Twitter bots launched over the past few years with much wistfulness… some examples of my favourites:

When I recently saw Andrew Booth’s tweet about his “Random Review Label Generator”, I knew it was time for me to get in on the action.

As it turns out, a lovely fellow has made the process of creating Twitter bots super easy by coding all the hard stuff and launching a user-friendly template with step-by-step instructions, freely available for anyone to use. Special thanks to Zach Whalen for creating and putting this online!

So: without further ado, I present to you a Twitter bot that randomly generates a new healthcare review project every hour. You’re welcome!

The beauty of this bot is that some of the project names are so ridiculous… any yet you wouldn’t be surprised to see many of them actually published. I am endlessly entertained by the combinations that it comes up with, and I hope you are too!

Grey Lit Searching for Dummies

Ah, grey literature! Confronted with a vast void of faceless, nameless literature, it’s easy to quickly become overwhelmed. Where do I start? What do I search? What am I even looking for?

As a medical librarian, I’m used to structured searches in curated databases, and going into the unknown can be a frightening thought. However, it is possible to add structure to a grey literature search!

First: What are you looking for?

Too often, the idea of “grey literature” is lumped into one monolithic term. In reality, grey literature is a broad umbrella term and encompasses a lot of different document types whose main commonality is that they are unpublished or published outside traditional publishers: basically, anything that’s not a traditional published research article.

Think about the research project at hand and what types of literature would best support it. For example, in a qualitative synthesis or realist review of a social sciences topic, a lot of robust evidence might come from book chapters with unpublished studies. In a mapping review in health services research, government white papers/reports about local health initiatives might be most relevant. What do you expect the evidence to look like, and where might you go about finding it?

  • Reports or white papers
  • Theses and dissertations
  • Book chapters
  • Clinical trials registers
  • Conference proceedings

Second: Make a plan!

Next, make a detailed plan for searching the literature. Your searching plan should contain information about what sources will be searched, how they will be searched, how the searches will be documented, and how/where the potentially relevant documents will be downloaded/stored.

Some strategies to consider including in your plan might be:

  • Traditional database searches that will include grey lit such as conference abstracts (e.g. PyscINFO, Embase, Proquest Theses and Dissertations)
  • Specialised databases (e.g. “grey” databases such as OpenGrey or HMIC, or small subject-specific databases without sophistocated search mechanisms)
  • Search engines (e.g. Google, GoogleScholar, DuckDuckGo)
  • Custom Google search engines (e.g. NGOs search, Just State Web Sites; Think Tank Search)
  • Clinical Trials registers
  • Hand searching of key subject websites (e.g. the main associations or government departments in that topic area)
  • Consultation with experts (who may have ideas about papers you have missed)

For each strategy, document all the details you will need to conduct the search:

  • Who is going to conduct the search?
  • What search terms or search strategies will be used?

For more sophistocated sites, a full boolean strategy might be used, but for a site with a simple search box, perhaps one term or a few terms at a time might need to be used. Strategies should be similar, but adapted for the searching capabilities of that resource.

Think also about the context: if your search topic is “yoga for substance abuse”, and you’re searching the NIDA International Drug Abuse Research Abstract Database, you won’t need to include substance abuse terminology in your searches, because everything in that subject database is already about substance abuse.

  • How will the searches be documented? Oftentimes, an excel spreadsheet will suffice with information such as the person searching, the date, the searching strategy, number of items looked at, and the number of items selected as potentially relevant. Bear in mind that for some resources, the searching strategy might be narrative, such as “clicked the research tab and browsed the list of publications”.
  • How many results will you look at? The first 50? The first 100? Until there are diminishing returns?

Third: Execute the plan!

Make sure to have a strategy in place for recording your searches and downloading your citations. Due to the transient nature of the web, grey literature searches generally aren’t replicable. When you search google one week, and conduct the same search a year later, you might get different results. However, searches for grey literature can and should be transparent and well-documented, such that someone else could conduct the same searches at a later point, even if they would get different results.

For more information, check out the following papers:

Briscoe S. Web searching for systematic reviews: a case study of reporting standards in the UK Health Technology Assessment programme. BMC research notes. 2015 Apr 16;8(1):153.

Godin K, Stapleton J, Kirkpatrick SI, Hanning RM, Leatherdale ST. Applying systematic review search methods to the grey literature: a case study examining guidelines for school-based breakfast programs in Canada. Systematic reviews. 2015 Oct 22;4(1):138.

Quick tip: use the Ovid multi-line launcher

In the “multi-line” vs “single line” searches debate, one point that is often thrown around is: multi-line searches are more cumbersome to edit and run. Even with Ovid’s new “edit” button, it still takes a few clicks and a few page refreshes to edit a strategy and see the results. When making lots of changes quickly to a strategy, this time can really add up.

One underappreciated and little known tool is Ovid’s mutli-line launcher. It’s beautiful! The multi-line launcher allows a user to copy/paste a multi-line strategy directly into the search box, press enter, and view the search results – with hits for each line – as normal.

screenshot of Ovid’s multi-line launcher tool

When making edits to a strategy I tend to do the following:

  1. paste the strategy into the multi-line launcher box
  2. ensure that the line numbers are still correct or changed if needed
  3. press enter to view results
  4. if strategy requires a change, type “..pg all” into the search box in the main Ovid MEDLINE interface to delete search history (see more about keyboard shortcuts in Ovid here)
  5. Make edits to the strategy in a word document
  6. Paste back into the multi-line launcher box

I’ve found this strategy works more quickly and with less site time-outs than using the native “edit” button.

Try it here: http://demo.ovid.com/demo/ovidsptools/launcher/launcher.html

How to convert a search between PubMed and Ovid

Have you ever tried to convert a search strategy from PubMed to Ovid or vice versa? It can be a real pain. The field codes in Ovid don’t always nicely match up with the tags in PubMed and it can be difficult to wrap your head around the auto-expode in PubMed vs manual explode in Ovid for indexing terms. Not to mention that there is some functionality that exists in Ovid but not PubMed (such as proximity operators) and in PubMed that doesn’t exist in Ovid (such as the supplementary concepts tag). Yikes!

Why would you want to convert a search strategy between the two, you ask? Don’t they have the same content?

  1. There is some content that is in PubMed but not Ovid MEDLINE. The NLM factsheet “MEDLINE, PubMed, and PMC (PubMed Central): How are they different?” gives an overview of PubMed’s unique content.
  2. You might want to use features that are available in both databases! Maybe you’re working on a strategy in Ovid MEDLINE, but realise partway through you’d really like to use one of the PubMed subject filters, for example.
  3. Sometimes, you might find a search filter or hedge, but it is written in the syntax of a different interface. Translating a strategy isn’t always easy or intuitive, so automated the process can reduce errors and save time.

Over the past few months, I’ve been working with a colleague to build a tool that automatically converts searches between the two interfaces, and we recently presented our work at the EAHIL/ICML conference in Dublin.

EAHIL/ICML conference in Dublin

During the conference week, we had dozens of excellent conversations in person and on Twitter, and 138 unique website visitors! Thanks to everyone who provided feedback and suggestions for improvements. We are working hard to incorporate many of them over the coming months.

The tool is freely available at medlinetranspose.github.io. Please feel free to check it out and let us know how it works for you!

Finding a random set of citations in EndNote

Have you ever been asked to find a random set of citation from EndNote? This happens most often to me when researchers are testing out screening procedures, and want to ensure they are all interpreting the screening guidelines the same way. The researchers will all screen the same random set of 10-20 articles and compare results before screening the entire set.

So: what’s the best way to go about this? Sorting from a-z on any given field and selecting the top 10-20 articles isn’t likely to be truly random. For example, sorting by date will retrieve only very new or old articles. Sorting by record number is one possible way to do it, but also isn’t truly random as it will retrieve articles added to the database most or least recently.

Here’s how I take a truly random sample of citations from EndNote.

First, create an output filter in EndNote

The output filter will include only the citation record numbers. Don’t worry, you only have to do this once, and in the future it will all be set up for you!

  1. In EndNote, go to Edit –> Output Styles –> New Style
  2. In the resulting screen, click “templates” under the heading “bibliography”
  3. Then, put your curser in the box below “generic”. Then, click “insert field” –> “Record Number” –> then press enter so that you curser goes to the next line in the text box.
  4. Go to “file” –> “save as” and save it to something descriptive like “record-number-only”.

Next, export your record numbers.

  1. Back in the main EndNote screen, click the dropdown box at the top of the screen, then “select another style”, and search for your previously created Output Style.
  2. Then click “choose”. Ensure that your output style name is displaying in the dropdown box!
  3. Select “all references” to make sure all your references (that you want to create a subset from) are displayed. Then click one of the references and press ctrl + a (or cmd + a on a mac) to select all references.
  4. Right-hand click and select “copy formatted”.

Create your random subset!

  1. Open excel, and press ctrl + v (or cmd + v on a mac) to paste all your record numbers.
  2. in the cell to the right of your first record number, insert the formula =rand(). This will create a random number from 0 to 100.
  3. Hover the cursor over the bottom-right corner of the cell until it makes a cross. Then click and drag all the way down to the last row that contains a record number
    Insert a row at the top and click “sort & filter” –> “filter” on the menu bar.
  4. Then, sort the second row (with the random numbers) from smallest to largest (or largest to smallest).
  5. You now have a randomly sorted list! Select and copy the top x number of cells in the first column (however large you want your sample to be).

Format your record numbers to put back into EndNote.

  1. Paste your subset of record numbers into word (paste as text, not a table!)
  2. Click “replace” on the main toolbar to bring up the find and replace box.
  3. Beside the box “find what”, write ^p (the up-carrot symbol followed by “p”).
    Beside the box “replace with”, insert a semi-colon followed by one space.
  4. Then click “replace all”.
  5. You should have a string of record numbers separated by semi-colons.

Put them back into EndNote!

  1. Go back to your EndNote Library.
  2. Right-hand click in the sidebar and select “create smart group”
  3. Give it a nice title, like “random set” 😃
  4. In the first dropdown box, select “record number”, then “word begins with”, then paste in your formatted record numbers separated by semi-colons.
  5. Click “create”.
  6. All done!

I hope you found this useful. It might sound complicated, but this process really only takes a few seconds once you have gone through it a few times.

Do you have a more efficient or a different way of doing it? What kinds of formatting and database problems do you come across in your position? Feel free to send me a message or tweet at me.

Til next time,