De-duplicating EndNote results against a previous search

Well hello my friends! It’s been a long time since I’ve posted. Forgive me, as much has happened in the last year, including: moving across an ocean, subsequently moving across a city, starting a busy freelance information specialist business, and many mundane crises, trips, and side-projects in between.

I was tasked recently with updating a systematic review search with a new and improved search strategy completely unlike the previous one. There were new search terms added to this search, but also several irrelevant search terms that had been deleted. I’d done systematic review updates before, but I’d always simply used date filters in the databases to capture results from the date of previous search.

But this time, the researchers wanted any articles that would be captured by the new search, in any date period, and wanted to ensure they weren’t screening any citations that had already been previously screened with the old search.

I’d heard other information specialists talk about using EndNote to de-duplicate against a previous search, but never tried it myself. It just seemed unduly complicated. Date filters seemed to be working perfectly fine.

It was around this same time that I found out that date filters were not perfectly fine.

One day, I went to date limit an Embase search using the advice from a LibGuide at a high-ranking university…. and was horrified to find out that a not insignificant number of citations that ought to have been picked up had not.

Cue a minor panic as I tried to figure out whether I had royally screwed up any of my previous projects.

Friends, I have learned the error of my ways, and will henceforth de-duplicate my systematic review update searches in EndNote when possible. Cross my heart, etc, etc.

As usual when picking up a new skill, I went to Twitter to see what all the experts were doing.

Here follows the method that I ended up using. I’ve documented it for my own purposes and hope that it can come in handy for others as well.

1. De-duplicate your total search results in EndNote, as normal.

You can use whatever process works best for you. I tend to use a modified version of Bramer et al, 2016 in which I progressively choose different field combinations in EndNote to test for duplicates, and manually go through the results. The field combinations suggested in the article include (in this order):

  • Author | Year | Title | Secondary Title (Journal)
  • Author | Year | Title | Pages
  • Title | Volume | Pages
  • Author | Volume | Pages
  • Year | Volume | Issue | Pages
  • Title
  • Author | Year

But if you want to get fancy about it, the article supplies a more complicated process than this.

2. Label your citations by search date.

At this point, you’ll want to load up your citations from the previously conducted search into a separate EndNote Library. Then, use one of the custom fields to mark these citations.

  1. Select one of your citations from the “All Refs” group, then click cmd + A or ctrl + A to select all in the entire library.
  2. Then, go to tools, then Change/move/copy fields
  3. Select “Custom 1” (or another field of your choosing), then “replace whole field with”
  4. Choose text that is meaningful for you for remembering which citations these are. Something as simple as “OLD” may suffice (this is what I did, based on a Twitter tip).
  5. Next, do the same for your “new” search results.

Note that this is an important step if your search strategy has changed such that some results that were previously returned will not be returned in the new search. Otherwise, you will end up re-screening those articles!

3. Combine your “old” and “new” EndNote Libraries together

To combine your two libraries together, navigate to your “old” library and select all the citations by clicking ctrl + A or cmd + A.

Then click “references”, then “copy references to”, and choose your “new” library. Easy peasy!

4. Remove duplicates

Use your EndNote Library which contains both your “old” and “new” records (both of which have previously had duplicates removed!), and remove your duplicates as you normally do, or following the process in Step 1.

But this time, there’s one big difference – every time you find a duplicate, instead of removing the duplicate record, you’ll remove BOTH records, since they represent a previously screened record that you won’t need to screen again.

5. Remove any remaining previously screened citations

This step won’t be necessary if no search terms have been removed since the original search was conducted.

In my case, the search had changed drastically since its creation by someone else, and I needed to remove any citations that were picked up by the original search, but not by the new one. Removing these records is easy if you have followed Step 2, above!

Simply create a smart group by right-hand clicking over “my groups” in EndNote. Then, set the parameters to find your old citations (e.g. “custom 1”, “is”, “OLD”). Then, navigate to your smart group and delete all the citations in this group. These have already been previously screened and weren’t retrieved by the new search.

And that’s basically it! I was able to tackle this new skill that originally seemed kind of hard, and you can too!

For more information, the following papers may also be useful:

Bramer WM, Bain P. Updating search strategies for systematic reviews using EndNote. Journal of the Medical Library Association: JMLA. 2017 Jul;105(3):285.

Bramer WM, Giustini D, de Jonge GB, Holland L, Bekhuis T. De-duplication of database search results for systematic reviews in EndNote. Journal of the Medical Library Association: JMLA. 2016 Jul;104(3):240.

Advertisements

“How many citations will I have to screen?”

I get asked this question a lot at the beginning stages of a review project. It’s a fair question: researchers want to know what their screening workload is going to look like, and screening abstracts is a tedious process.

There’s some great research happening right now on how to make screening less tedious – using text mining to automate or semi-automate the process, for example. These are promising approaches, but for now, most reviews need a human eye to oversee study selection for at least part of the process.

I used to get flustered when I was asked this question because I was afraid of giving the wrong answer and underestimating the amount of work a researcher would later have to do. On the other hand, if I overestimated the number of citations to screen, a researcher might want to change the search strategy to lower the number of citations or otherwise change the methodology.

I also didn’t have a very good answer. It’s hard to estimate the number of citations to be screened without downloading them from all sources and de-duplicating first. I’ve sometimes estimated the total number of included studies at the end of the project by screening a random sample (such as 100 citations) and calculating the ratio of the sample to the total number of citations to be screened (e.g. if there is one relevant article in a random sample of 100, and 1000 articles total to screen, I would estimate about 10 studies to be included at the end of the project – for more on getting a truly random sample of citations, see my previous blog post on this topic).

Tired of giving bad answers to this question, I’ve crunched the numbers for a few of the review projects I’ve worked on in the last year. For each of the projects, I found the number of citations downloaded from the Ovid MEDLINE search only, then the number of citations left to screen at the title/abstract stage after citations were downloaded from all databases and duplicates removed.

The results are below:Capture

The ratio of MEDLINE search only to total citations to screen ranged from 239% to 1333%. However, the last two columns represent projects that had less of a biomedical focus (social sciences and computer science, respectively). The MEDLINE searches were still relevant in these cases, but we didn’t expect the majority of our studies to come from MEDLINE. Thus, if we exclude the major outlier, with more of a social sciences focus, the results are a little more consistent.

For each of the projects above, I searched 6-7 databases, except project 1, where I searched 14(?!). However, the ratio of citations for project 1 is not exceptionally different than that of the other projects. For now, I can’t see a discernible difference between screening burden and number of databases searched, but possibly more data is needed.

My overall take-away from this exercise is that, for the searches that I run, the screening burden of a systematic review tends to be about 2.5x to 5x that of the original MEDLINE search. In the future, this is the advice that I’ll be giving my researchers to help them better plan their resources and time. I can breathe a small sigh of relief, too, knowing that the information that I give my researchers is just a little more evidence-based than it was the day before.

Quick tip: use the Ovid multi-line launcher

In the “multi-line” vs “single line” searches debate, one point that is often thrown around is: multi-line searches are more cumbersome to edit and run. Even with Ovid’s new “edit” button, it still takes a few clicks and a few page refreshes to edit a strategy and see the results. When making lots of changes quickly to a strategy, this time can really add up.

One underappreciated and little known tool is Ovid’s mutli-line launcher. It’s beautiful! The multi-line launcher allows a user to copy/paste a multi-line strategy directly into the search box, press enter, and view the search results – with hits for each line – as normal.

Ovid-multi-line-launcher
screenshot of Ovid’s multi-line launcher tool

When making edits to a strategy I tend to do the following:

  1. paste the strategy into the multi-line launcher box
  2. ensure that the line numbers are still correct or changed if needed
  3. press enter to view results
  4. if strategy requires a change, type “..pg all” into the search box in the main Ovid MEDLINE interface to delete search history (see more about keyboard shortcuts in Ovid here)
  5. Make edits to the strategy in a word document
  6. Paste back into the multi-line launcher box

I’ve found this strategy works more quickly and with less site time-outs than using the native “edit” button.

Try it here: http://demo.ovid.com/demo/ovidsptools/launcher/launcher.html

Finding a random set of citations in EndNote

Have you ever been asked to find a random set of citation from EndNote? This happens most often to me when researchers are testing out screening procedures, and want to ensure they are all interpreting the screening guidelines the same way. The researchers will all screen the same random set of 10-20 articles and compare results before screening the entire set.

So: what’s the best way to go about this? Sorting from a-z on any given field and selecting the top 10-20 articles isn’t likely to be truly random. For example, sorting by date will retrieve only very new or old articles. Sorting by record number is one possible way to do it, but also isn’t truly random as it will retrieve articles added to the database most or least recently.

Here’s how I take a truly random sample of citations from EndNote.

First, create an output filter in EndNote

The output filter will include only the citation record numbers. Don’t worry, you only have to do this once, and in the future it will all be set up for you!

  1. In EndNote, go to Edit –> Output Styles –> New Style
  2. In the resulting screen, click “templates” under the heading “bibliography”
  3. Then, put your curser in the box below “generic”. Then, click “insert field” –> “Record Number” –> then press enter so that you curser goes to the next line in the text box.
  4. Go to “file” –> “save as” and save it to something descriptive like “record-number-only”.

Next, export your record numbers.

  1. Back in the main EndNote screen, click the dropdown box at the top of the screen, then “select another style”, and search for your previously created Output Style.
  2. Then click “choose”. Ensure that your output style name is displaying in the dropdown box!
  3. Select “all references” to make sure all your references (that you want to create a subset from) are displayed. Then click one of the references and press ctrl + a (or cmd + a on a mac) to select all references.
  4. Right-hand click and select “copy formatted”.

Create your random subset!

  1. Open excel, and press ctrl + v (or cmd + v on a mac) to paste all your record numbers.
  2. in the cell to the right of your first record number, insert the formula =rand(). This will create a random number from 0 to 100.
  3. Hover the cursor over the bottom-right corner of the cell until it makes a cross. Then click and drag all the way down to the last row that contains a record number
    Insert a row at the top and click “sort & filter” –> “filter” on the menu bar.
  4. Then, sort the second row (with the random numbers) from smallest to largest (or largest to smallest).
  5. You now have a randomly sorted list! Select and copy the top x number of cells in the first column (however large you want your sample to be).

Format your record numbers to put back into EndNote.

  1. Paste your subset of record numbers into word (paste as text, not a table!)
  2. Click “replace” on the main toolbar to bring up the find and replace box.
  3. Beside the box “find what”, write ^p (the up-carrot symbol followed by “p”).
    Beside the box “replace with”, insert a semi-colon followed by one space.
  4. Then click “replace all”.
  5. You should have a string of record numbers separated by semi-colons.

Put them back into EndNote!

  1. Go back to your EndNote Library.
  2. Right-hand click in the sidebar and select “create smart group”
  3. Give it a nice title, like “random set” 😃
  4. In the first dropdown box, select “record number”, then “word begins with”, then paste in your formatted record numbers separated by semi-colons.
  5. Click “create”.
  6. All done!

I hope you found this useful. It might sound complicated, but this process really only takes a few seconds once you have gone through it a few times.

Do you have a more efficient or a different way of doing it? What kinds of formatting and database problems do you come across in your position? Feel free to send me a message or tweet at me.

Til next time,
Amanda

— POSTSCRIPT —

I was asked recently on Twitter how to isolate the remainder of citations for screening after using this method. It’s very easy!

creating a combination group in EndNote
creating a combination group in EndNote

To isolate the rest of your citations, simply make a combination group by doing the following:

  1. First, create a new group in EndNote calls “All Refs”, and drag ALL your citations from the library into it by going to “All References”, selecting all by clicking ctrl/cmd + A, and dragging them into your new group.
  2. Right-hand click on “My Groups” in EndNote, then click “Create From Groups”. Name this group “remainder to screen”, or something else that makes sense to you.
  3. In the first drop-down menu, select your “All Refs” group.
  4. Then select “NOT” from the boolean operators dropdown menu.
  5. Then select the group that holds your random subset in the second dropdown menu.

using ovid faster and smarter

Did you know that Ovid’s search bar can be used like a command line? Its most common use is to type in search queries, but it can also be used to execute several time-saving commands.

Each command is preceded by two dots (..). These are what tell the database that you don’t want to search for terms, but do something different. Remember that there is no space between the two dots (..) and the command!

Part 1: Save and execute searches

  • ..sv ps(search name) will save your search permanently. For example, “..sv ps(Heart-Disease)” (without the quotes) to save the current search. The parenthesis are important — without them, the search will only be saved temporarily (24 hours). I like to periodically type in the same command above while working to save any updates to the search that I’m working on.
  • ..e <saved search name> will execute a search. For example, if you have a saved search called Heart-Disease, type “..e Heart-Disease” (without the quotes) to execute the search.
  • ..pg all to clear the search history. If your search is saved, it will stay saved, but this allows you to clear the slate and start something new. Similarly, use “.. pg #,#” (without the quotes) to purge specific lines.
  • ..dedup # to remove any duplicates from a specific line in the search history.
  • ..ps to view the entire search history in a printable format

Part 2: Look up information about MeSH

  • ..scope <subject heading> will look up the scope note for the indicated subject heading. For example “..scope heart diseases” (without the quotes).
  • ..tree <subject heading> will look up the subject heading in the tree hierarchy. For example, “..tree heart diseases” (without the quotes).
  • ..sh <subject heading> to look up the subheading selection window for the subject heading.

(Note: The three commands above can be used with out without the dot dot (..) syntax preceding the command. I like to use it for all commands for consistency).

All of this information is also contained in Ovid’s help documentation.

I hope you find these commands as useful as I do. If you can master these, you’ll be well on your way to becoming a database master (and also wow those around you with you efficient navigating ability!).

Til next time,

Amanda