“How many citations will I have to screen?”

I get asked this question a lot at the beginning stages of a review project. It’s a fair question: researchers want to know what their screening workload is going to look like, and screening abstracts is a tedious process.

There’s some great research happening right now on how to make screening less tedious – using text mining to automate or semi-automate the process, for example. These are promising approaches, but for now, most reviews need a human eye to oversee study selection for at least part of the process.

I used to get flustered when I was asked this question because I was afraid of giving the wrong answer and underestimating the amount of work a researcher would later have to do. On the other hand, if I overestimated the number of citations to screen, a researcher might want to change the search strategy to lower the number of citations or otherwise change the methodology.

I also didn’t have a very good answer. It’s hard to estimate the number of citations to be screened without downloading them from all sources and de-duplicating first. I’ve sometimes estimated the total number of included studies at the end of the project by screening a random sample (such as 100 citations) and calculating the ratio of the sample to the total number of citations to be screened (e.g. if there is one relevant article in a random sample of 100, and 1000 articles total to screen, I would estimate about 10 studies to be included at the end of the project – for more on getting a truly random sample of citations, see my previous blog post on this topic).

Tired of giving bad answers to this question, I’ve crunched the numbers for a few of the review projects I’ve worked on in the last year. For each of the projects, I found the number of citations downloaded from the Ovid MEDLINE search only, then the number of citations left to screen at the title/abstract stage after citations were downloaded from all databases and duplicates removed.

The results are below:Capture

The ratio of MEDLINE search only to total citations to screen ranged from 239% to 1333%. However, the last two columns represent projects that had less of a biomedical focus (social sciences and computer science, respectively). The MEDLINE searches were still relevant in these cases, but we didn’t expect the majority of our studies to come from MEDLINE. Thus, if we exclude the major outlier, with more of a social sciences focus, the results are a little more consistent.

For each of the projects above, I searched 6-7 databases, except project 1, where I searched 14(?!). However, the ratio of citations for project 1 is not exceptionally different than that of the other projects. For now, I can’t see a discernible difference between screening burden and number of databases searched, but possibly more data is needed.

My overall take-away from this exercise is that, for the searches that I run, the screening burden of a systematic review tends to be about 2.5x to 5x that of the original MEDLINE search. In the future, this is the advice that I’ll be giving my researchers to help them better plan their resources and time. I can breathe a small sigh of relief, too, knowing that the information that I give my researchers is just a little more evidence-based than it was the day before.

Advertisements

The secret to bibliometric analysis: generating a list of PMIDs

By now, it’s probably no secret that I love crunching bibliometric data. I find that analysing my results — both during search strategy formation and after downloading final results — gives me a broader perspective and see trends that I might otherwise miss.

However, analysing data can sometimes be time consuming and clunky. Data never seems to be in the format that you want it when you need it; the precise tool that you need at that moment hasn’t been invented yet or is otherwise proprietary; the right software for the job requires a programming language you haven’t yet learned, and so forth. Sometimes you want a quick and dirty answer to help develop a strategy and it doesn’t have to be tidy or perfect, but you need it now!

Here’s my quick and dirty trick for analysing your bibliometric [medline] data:

  1. Generate a list of PMIDs from your results (whether your strategy is finalised or not!)
  2. Pop into the data analysis program of your choosing…

The beauty of this trick is that you can copy-paste whatever you are working on at this very moment (provided you’re working with medline data, of course…) and get real-time feedback. No need to mess with clunky software interfaces or retype your strategy.

Generate a list of PMIDs

PubMed

If you’re using PubMed, this part is easy. Click the “Format: Summary” drop down menu just below the search bar, then select “PMID”. Et voila! The resulting page is a plain text list of PMIDs, taken from the results on the previous page.

screenshot.PNG

Note that the resulting PMID list will show only the citations from the previous page, so you may want to scroll to the bottom of the screen to show the max number of citations per page (200 at the time of this writing).

Ovid MEDLINE

To extract PMIDs from Ovid:

  • select all citations (or a range if there’s a lot!)
  • click “export”
  • select “excel” under the drop-down menu “Export To:”
  • select “custom fields”
  • under “select fields” (beside the “custom fields” radio button), unselect everything except “unique identifier” (this is the field that contains the PMID in Ovid)
  • Then select “export citations”

An excel file should download with a column of PMIDs, which can then be copied/pasted.

(Thanks to Michelle Fiander for the excel tip!)

 

Analyse your data

Once you have your list of PMIDs, you can pop them into a variety of different tools to crunch the data in different ways. For example, try pasting your list into:

  • PubReminer  – for a word count analysis of authors, journals, MeSH, title/abstract words…
  • Medline Trends – for an analysis of citations over time
  • GoPubMed – for a variety of filters (maps! bar graphs! frequency charts!)
  • Yale MeSH Analyser – for a side-by-side comparison of MeSH usage

And more! Someday I intend to write up a full list of medline data analysis tools freely available online, but that day is not today…

Capture.PNG
It’s not necessary to input a full search strategy into most bibliometric analysis programmes… simply paste in your PMIDs!

Why would a person bother to do this?

Building a search strategy is an iterative process and it requires using a lot of different tools. For example, you can use your own common sense and intuition, but other tried-and-true strategies include: backwards/forwards citation chaining, talking to experts in the field, or looking at highly cited papers/journals in the field.

Using quick data analysis strategies throughout the process of building a search strategy will help ensure that important concepts aren’t missed. They provide a more objective picture of what’s happening, what’s missing, and how you can better refine your strategy.

That’s it for this week!

PS This is my first proper blog and I must say… keeping a blog up to date is not as easy as I thought. Please do let me know if you find this content useful and I will try my utmost to keep ’em coming! You can use the site contact form or find me on twitter at @v_woolf.