De-duplicating EndNote results against a previous search

Well hello my friends! It’s been a long time since I’ve posted. Forgive me, as much has happened in the last year, including: moving across an ocean, subsequently moving across a city, starting a busy freelance information specialist business, and many mundane crises, trips, and side-projects in between.

I was tasked recently with updating a systematic review search with a new and improved search strategy completely unlike the previous one. There were new search terms added to this search, but also several irrelevant search terms that had been deleted. I’d done systematic review updates before, but I’d always simply used date filters in the databases to capture results from the date of previous search.

But this time, the researchers wanted any articles that would be captured by the new search, in any date period, and wanted to ensure they weren’t screening any citations that had already been previously screened with the old search.

I’d heard other information specialists talk about using EndNote to de-duplicate against a previous search, but never tried it myself. It just seemed unduly complicated. Date filters seemed to be working perfectly fine.

It was around this same time that I found out that date filters were not perfectly fine.

One day, I went to date limit an Embase search using the advice from a LibGuide at a high-ranking university…. and was horrified to find out that a not insignificant number of citations that ought to have been picked up had not.

Cue a minor panic as I tried to figure out whether I had royally screwed up any of my previous projects.

Friends, I have learned the error of my ways, and will henceforth de-duplicate my systematic review update searches in EndNote when possible. Cross my heart, etc, etc.

As usual when picking up a new skill, I went to Twitter to see what all the experts were doing.

Here follows the method that I ended up using. I’ve documented it for my own purposes and hope that it can come in handy for others as well.

1. De-duplicate your total search results in EndNote, as normal.

You can use whatever process works best for you. I tend to use a modified version of Bramer et al, 2016 in which I progressively choose different field combinations in EndNote to test for duplicates, and manually go through the results. The field combinations suggested in the article include (in this order):

  • Author | Year | Title | Secondary Title (Journal)
  • Author | Year | Title | Pages
  • Title | Volume | Pages
  • Author | Volume | Pages
  • Year | Volume | Issue | Pages
  • Title
  • Author | Year

But if you want to get fancy about it, the article supplies a more complicated process than this.

2. Label your citations by search date.

At this point, you’ll want to load up your citations from the previously conducted search into a separate EndNote Library. Then, use one of the custom fields to mark these citations.

  1. Select one of your citations from the “All Refs” group, then click cmd + A or ctrl + A to select all in the entire library.
  2. Then, go to tools, then Change/move/copy fields
  3. Select “Custom 1” (or another field of your choosing), then “replace whole field with”
  4. Choose text that is meaningful for you for remembering which citations these are. Something as simple as “OLD” may suffice (this is what I did, based on a Twitter tip).
  5. Next, do the same for your “new” search results.

Note that this is an important step if your search strategy has changed such that some results that were previously returned will not be returned in the new search. Otherwise, you will end up re-screening those articles!

3. Combine your “old” and “new” EndNote Libraries together

To combine your two libraries together, navigate to your “old” library and select all the citations by clicking ctrl + A or cmd + A.

Then click “references”, then “copy references to”, and choose your “new” library. Easy peasy!

4. Remove duplicates

Use your EndNote Library which contains both your “old” and “new” records (both of which have previously had duplicates removed!), and remove your duplicates as you normally do, or following the process in Step 1.

But this time, there’s one big difference – every time you find a duplicate, instead of removing the duplicate record, you’ll remove BOTH records, since they represent a previously screened record that you won’t need to screen again.

5. Remove any remaining previously screened citations

This step won’t be necessary if no search terms have been removed since the original search was conducted.

In my case, the search had changed drastically since its creation by someone else, and I needed to remove any citations that were picked up by the original search, but not by the new one. Removing these records is easy if you have followed Step 2, above!

Simply create a smart group by right-hand clicking over “my groups” in EndNote. Then, set the parameters to find your old citations (e.g. “custom 1”, “is”, “OLD”). Then, navigate to your smart group and delete all the citations in this group. These have already been previously screened and weren’t retrieved by the new search.

And that’s basically it! I was able to tackle this new skill that originally seemed kind of hard, and you can too!

For more information, the following papers may also be useful:

Bramer WM, Bain P. Updating search strategies for systematic reviews using EndNote. Journal of the Medical Library Association: JMLA. 2017 Jul;105(3):285.

Bramer WM, Giustini D, de Jonge GB, Holland L, Bekhuis T. De-duplication of database search results for systematic reviews in EndNote. Journal of the Medical Library Association: JMLA. 2016 Jul;104(3):240.

Advertisements

“How many citations will I have to screen?”

I get asked this question a lot at the beginning stages of a review project. It’s a fair question: researchers want to know what their screening workload is going to look like, and screening abstracts is a tedious process.

There’s some great research happening right now on how to make screening less tedious – using text mining to automate or semi-automate the process, for example. These are promising approaches, but for now, most reviews need a human eye to oversee study selection for at least part of the process.

I used to get flustered when I was asked this question because I was afraid of giving the wrong answer and underestimating the amount of work a researcher would later have to do. On the other hand, if I overestimated the number of citations to screen, a researcher might want to change the search strategy to lower the number of citations or otherwise change the methodology.

I also didn’t have a very good answer. It’s hard to estimate the number of citations to be screened without downloading them from all sources and de-duplicating first. I’ve sometimes estimated the total number of included studies at the end of the project by screening a random sample (such as 100 citations) and calculating the ratio of the sample to the total number of citations to be screened (e.g. if there is one relevant article in a random sample of 100, and 1000 articles total to screen, I would estimate about 10 studies to be included at the end of the project – for more on getting a truly random sample of citations, see my previous blog post on this topic).

Tired of giving bad answers to this question, I’ve crunched the numbers for a few of the review projects I’ve worked on in the last year. For each of the projects, I found the number of citations downloaded from the Ovid MEDLINE search only, then the number of citations left to screen at the title/abstract stage after citations were downloaded from all databases and duplicates removed.

The results are below:Capture

The ratio of MEDLINE search only to total citations to screen ranged from 239% to 1333%. However, the last two columns represent projects that had less of a biomedical focus (social sciences and computer science, respectively). The MEDLINE searches were still relevant in these cases, but we didn’t expect the majority of our studies to come from MEDLINE. Thus, if we exclude the major outlier, with more of a social sciences focus, the results are a little more consistent.

For each of the projects above, I searched 6-7 databases, except project 1, where I searched 14(?!). However, the ratio of citations for project 1 is not exceptionally different than that of the other projects. For now, I can’t see a discernible difference between screening burden and number of databases searched, but possibly more data is needed.

My overall take-away from this exercise is that, for the searches that I run, the screening burden of a systematic review tends to be about 2.5x to 5x that of the original MEDLINE search. In the future, this is the advice that I’ll be giving my researchers to help them better plan their resources and time. I can breathe a small sigh of relief, too, knowing that the information that I give my researchers is just a little more evidence-based than it was the day before.

Workbook for systematic review consultations

I’m often approached by masters and PhD students and researchers in my institution to advise on systematic review projects in the early stages. I’ve found that the skill levels to complete a systematic or scoping review are variable, and that many researchers need a primer to get up to speed about the process of conducting a review, what skills are required, and in particular, how to go about the planning process.

I support many projects in depth from start to finish, but for many projects at my institution, I only have the time to provide advice and consultations. Unfortunately, I quickly learned that throwing a lot of information at people in a short period of time was not useful, and I would sometimes see the same researchers at a later consultation who hadn’t gotten very far with their projects and needed a lot of the same information again.

notebookPhoto by Glenn Carstens-Peters on Unsplash

There are many, many resources online for conducting review projects, including some enviable LibGuides (I personally like the Queens University knowledge synthesis guide and University of Toronto learning to search guide). However, I wanted a resource that I could use when physically sitting with someone in a meeting room, where we could plan out their review project together. And I was getting pretty tired of drawing the same venn diagrams of how “AND” and “OR” boolean operators work on whatever scratch paper I had handy.

I recently developed a guide that fits these purposes, and after a few iterations and some testing and feedback, I’ve put it online for others to use and edit as they wish with a CC-NC-SA 4.0 License. The goal of this guide is to provide a resource that:

  • Can be printed and used as a workbook to guide a systematic reviews consultation
  • Also contains enough information to be a stand-alone self-learning resource for after the consultation (e.g. the information on boolean operators)
  • Is not too long to be intimidating or overwhelming for someone just getting started

Without a doubt, there will be further refinements and additions to the guide over time, but for now, please feel free to download, use, and edit for your own purposes. Any feedback or comments are also gratefully accepted. 🙂

You can find the guide here at Open Science Framework.

screenshot of OSF

Building a Twitter bot!

I have long admired – and, I’ll admit – been a bit fearful of cool technology projects that make use of APIs. To be honest, I’m still not *entirely* sure how an API works. It feels a bit like magic. You need keys and secret keys and bits of code and all those things need to be in the right place at the right time and I might even have to use scary things like the command line!

So you can imagine, I’ve been looking at all the cool Twitter bots launched over the past few years with much wistfulness… some examples of my favourites:

When I recently saw Andrew Booth’s tweet about his “Random Review Label Generator”, I knew it was time for me to get in on the action.

As it turns out, a lovely fellow has made the process of creating Twitter bots super easy by coding all the hard stuff and launching a user-friendly template with step-by-step instructions, freely available for anyone to use. Special thanks to Zach Whalen for creating and putting this online!

So: without further ado, I present to you a Twitter bot that randomly generates a new healthcare review project every hour. You’re welcome!

The beauty of this bot is that some of the project names are so ridiculous… any yet you wouldn’t be surprised to see many of them actually published. I am endlessly entertained by the combinations that it comes up with, and I hope you are too!

Grey Lit Searching for Dummies

Ah, grey literature! Confronted with a vast void of faceless, nameless literature, it’s easy to quickly become overwhelmed. Where do I start? What do I search? What am I even looking for?

As a medical librarian, I’m used to structured searches in curated databases, and going into the unknown can be a frightening thought. However, it is possible to add structure to a grey literature search!

First: What are you looking for?

Too often, the idea of “grey literature” is lumped into one monolithic term. In reality, grey literature is a broad umbrella term and encompasses a lot of different document types whose main commonality is that they are unpublished or published outside traditional publishers: basically, anything that’s not a traditional published research article.

Think about the research project at hand and what types of literature would best support it. For example, in a qualitative synthesis or realist review of a social sciences topic, a lot of robust evidence might come from book chapters with unpublished studies. In a mapping review in health services research, government white papers/reports about local health initiatives might be most relevant. What do you expect the evidence to look like, and where might you go about finding it?

  • Reports or white papers
  • Theses and dissertations
  • Book chapters
  • Clinical trials registers
  • Conference proceedings

Second: Make a plan!

Next, make a detailed plan for searching the literature. Your searching plan should contain information about what sources will be searched, how they will be searched, how the searches will be documented, and how/where the potentially relevant documents will be downloaded/stored.

Some strategies to consider including in your plan might be:

  • Traditional database searches that will include grey lit such as conference abstracts (e.g. PyscINFO, Embase, Proquest Theses and Dissertations)
  • Specialised databases (e.g. “grey” databases such as OpenGrey or HMIC, or small subject-specific databases without sophistocated search mechanisms)
  • Search engines (e.g. Google, GoogleScholar, DuckDuckGo)
  • Custom Google search engines (e.g. NGOs search, Just State Web Sites; Think Tank Search)
  • Clinical Trials registers
  • Hand searching of key subject websites (e.g. the main associations or government departments in that topic area)
  • Consultation with experts (who may have ideas about papers you have missed)

For each strategy, document all the details you will need to conduct the search:

  • Who is going to conduct the search?
  • What search terms or search strategies will be used?

For more sophistocated sites, a full boolean strategy might be used, but for a site with a simple search box, perhaps one term or a few terms at a time might need to be used. Strategies should be similar, but adapted for the searching capabilities of that resource.

Think also about the context: if your search topic is “yoga for substance abuse”, and you’re searching the NIDA International Drug Abuse Research Abstract Database, you won’t need to include substance abuse terminology in your searches, because everything in that subject database is already about substance abuse.

  • How will the searches be documented? Oftentimes, an excel spreadsheet will suffice with information such as the person searching, the date, the searching strategy, number of items looked at, and the number of items selected as potentially relevant. Bear in mind that for some resources, the searching strategy might be narrative, such as “clicked the research tab and browsed the list of publications”.
  • How many results will you look at? The first 50? The first 100? Until there are diminishing returns?

Third: Execute the plan!

Make sure to have a strategy in place for recording your searches and downloading your citations. Due to the transient nature of the web, grey literature searches generally aren’t replicable. When you search google one week, and conduct the same search a year later, you might get different results. However, searches for grey literature can and should be transparent and well-documented, such that someone else could conduct the same searches at a later point, even if they would get different results.

For more information, check out the following:

Briscoe S. Web searching for systematic reviews: a case study of reporting standards in the UK Health Technology Assessment programme. BMC research notes. 2015 Apr 16;8(1):153.

Godin K, Stapleton J, Kirkpatrick SI, Hanning RM, Leatherdale ST. Applying systematic review search methods to the grey literature: a case study examining guidelines for school-based breakfast programs in Canada. Systematic reviews. 2015 Oct 22;4(1):138.

For a helpful list of grey literature sources to include in your search, also check out the Grey Matters tool published by the research information specialist team at CADTH.

Quick tip: use the Ovid multi-line launcher

In the “multi-line” vs “single line” searches debate, one point that is often thrown around is: multi-line searches are more cumbersome to edit and run. Even with Ovid’s new “edit” button, it still takes a few clicks and a few page refreshes to edit a strategy and see the results. When making lots of changes quickly to a strategy, this time can really add up.

One underappreciated and little known tool is Ovid’s mutli-line launcher. It’s beautiful! The multi-line launcher allows a user to copy/paste a multi-line strategy directly into the search box, press enter, and view the search results – with hits for each line – as normal.

Ovid-multi-line-launcher
screenshot of Ovid’s multi-line launcher tool

When making edits to a strategy I tend to do the following:

  1. paste the strategy into the multi-line launcher box
  2. ensure that the line numbers are still correct or changed if needed
  3. press enter to view results
  4. if strategy requires a change, type “..pg all” into the search box in the main Ovid MEDLINE interface to delete search history (see more about keyboard shortcuts in Ovid here)
  5. Make edits to the strategy in a word document
  6. Paste back into the multi-line launcher box

I’ve found this strategy works more quickly and with less site time-outs than using the native “edit” button.

Try it here: http://demo.ovid.com/demo/ovidsptools/launcher/launcher.html

How to convert a search between PubMed and Ovid

Have you ever tried to convert a search strategy from PubMed to Ovid or vice versa? It can be a real pain. The field codes in Ovid don’t always nicely match up with the tags in PubMed and it can be difficult to wrap your head around the auto-expode in PubMed vs manual explode in Ovid for indexing terms. Not to mention that there is some functionality that exists in Ovid but not PubMed (such as proximity operators) and in PubMed that doesn’t exist in Ovid (such as the supplementary concepts tag). Yikes!

Why would you want to convert a search strategy between the two, you ask? Don’t they have the same content?

  1. There is some content that is in PubMed but not Ovid MEDLINE. The NLM factsheet “MEDLINE, PubMed, and PMC (PubMed Central): How are they different?” gives an overview of PubMed’s unique content.
  2. You might want to use features that are available in both databases! Maybe you’re working on a strategy in Ovid MEDLINE, but realise partway through you’d really like to use one of the PubMed subject filters, for example.
  3. Sometimes, you might find a search filter or hedge, but it is written in the syntax of a different interface. Translating a strategy isn’t always easy or intuitive, so automated the process can reduce errors and save time.

Over the past few months, I’ve been working with a colleague to build a tool that automatically converts searches between the two interfaces, and we recently presented our work at the EAHIL/ICML conference in Dublin.

2A488B84-64F4-44B6-BF9A-A87D41BBCCF3
EAHIL/ICML conference in Dublin

During the conference week, we had dozens of excellent conversations in person and on Twitter, and 138 unique website visitors! Thanks to everyone who provided feedback and suggestions for improvements. We are working hard to incorporate many of them over the coming months.

The tool is freely available at medlinetranspose.github.io. Please feel free to check it out and let us know how it works for you!