Finding a random set of citations in EndNote

Have you ever been asked to find a random set of citation from EndNote? This happens most often to me when researchers are testing out screening procedures, and want to ensure they are all interpreting the screening guidelines the same way. The researchers will all screen the same random set of 10-20 articles and compare results before screening the entire set.

So: what’s the best way to go about this? Sorting from a-z on any given field and selecting the top 10-20 articles isn’t likely to be truly random. For example, sorting by date will retrieve only very new or old articles. Sorting by record number is one possible way to do it, but also isn’t truly random as it will retrieve articles added to the database most or least recently.

Here’s how I take a truly random sample of citations from EndNote.

First, create an output filter in EndNote

The output filter will include only the citation record numbers. Don’t worry, you only have to do this once, and in the future it will all be set up for you!

  1. In EndNote, go to Edit –> Output Styles –> New Style
  2. In the resulting screen, click “templates” under the heading “bibliography”
  3. Then, put your curser in the box below “generic”. Then, click “insert field” –> “Record Number” –> then press enter so that you curser goes to the next line in the text box.
  4. Go to “file” –> “save as” and save it to something descriptive like “record-number-only”.

Next, export your record numbers.

  1. Back in the main EndNote screen, click the dropdown box at the top of the screen, then “select another style”, and search for your previously created Output Style.
  2. Then click “choose”. Ensure that your output style name is displaying in the dropdown box!
  3. Select “all references” to make sure all your references (that you want to create a subset from) are displayed. Then click one of the references and press ctrl + a (or cmd + a on a mac) to select all references.
  4. Right-hand click and select “copy formatted”.

Create your random subset!

  1. Open excel, and press ctrl + v (or cmd + v on a mac) to paste all your record numbers.
  2. in the cell to the right of your first record number, insert the formula =rand(). This will create a random number from 0 to 100.
  3. Hover the cursor over the bottom-right corner of the cell until it makes a cross. Then click and drag all the way down to the last row that contains a record number
    Insert a row at the top and click “sort & filter” –> “filter” on the menu bar.
  4. Then, sort the second row (with the random numbers) from smallest to largest (or largest to smallest).
  5. You now have a randomly sorted list! Select and copy the top x number of cells in the first column (however large you want your sample to be).

Format your record numbers to put back into EndNote.

  1. Paste your subset of record numbers into word (paste as text, not a table!)
  2. Click “replace” on the main toolbar to bring up the find and replace box.
  3. Beside the box “find what”, write ^p (the up-carrot symbol followed by “p”).
    Beside the box “replace with”, insert a semi-colon followed by one space.
  4. Then click “replace all”.
  5. You should have a string of record numbers separated by semi-colons.

Put them back into EndNote!

  1. Go back to your EndNote Library.
  2. Right-hand click in the sidebar and select “create smart group”
  3. Give it a nice title, like “random set” ūüėÉ
  4. In the first dropdown box, select “record number”, then “word begins with”, then paste in your formatted record numbers separated by semi-colons.
  5. Click “create”.
  6. All done!

I hope you found this useful. It might sound complicated, but this process really only takes a few seconds once you have gone through it a few times.

Do you have a more efficient or a different way of doing it? What kinds of formatting and database problems do you come across in your position? Feel free to send me a message or tweet at me.

Til next time,

The secret to bibliometric analysis: generating a list of PMIDs

By now, it’s probably no secret that I love crunching bibliometric data. I find that analysing my¬†results — both during search strategy formation and after downloading final results — gives me a broader perspective and see trends that I might otherwise miss.

However, analysing data can sometimes be time consuming and clunky. Data never seems to be in the format that you want it when you need it; the precise tool that you need at that moment hasn’t been invented yet or is otherwise proprietary; the right software for the job requires a programming language you haven’t yet learned, and so forth. Sometimes you want a quick and dirty answer to help develop a strategy and it doesn’t have to be tidy or perfect, but you need it now!

Here’s my¬†quick and dirty trick for analysing your¬†bibliometric [medline] data:

  1. Generate a list of PMIDs from your results (whether your strategy is finalised or not!)
  2. Pop into the data analysis program of your choosing…

The beauty of this trick is that you can copy-paste whatever you are working on at this very moment (provided you’re working with medline data, of course…) and get real-time feedback. No need to mess with clunky software interfaces or retype your strategy.

Generate a list of PMIDs

If you’re using PubMed, this part is easy. Click the “Format: Summary” drop down menu just below the search bar, then select “PMID”. Et voila! The resulting page is a plain text list of PMIDs, taken from the results on the previous page.


Note that the resulting PMID list will show only the citations from the previous page, so you may want to scroll to the bottom of the screen to show the max number of citations per page (200 at the time of this writing).

If you’re working in Ovid (like I generally do), this is a bit trickier. Ovid citations still contain a PMID in the “unique identifier” field, but it’s not quite so easy to extract. There’s a few ways to go about this, but my usual strategy is to:

  1. Download all search results into EndNote reference management software
  2. Extract a list of PMIDs through a custom export filter

The downside to this method is that there is a limit to the amount of citations you can export at once from Ovid, and EndNote also gets a wee bit finicky when you start importing citations by the thousands… Thus, this technique is best done with small-ish citation sets.

Analyse your data

Once you have your list of PMIDs, you can pop them into a variety of different tools to crunch the data in different ways. For example, try pasting your list into:

  • PubReminer¬†¬†– for a word count analysis of authors, journals, MeSH, title/abstract words…
  • Medline Trends¬†– for an analysis of citations over time
  • GoPubMed¬†– for a variety of filters (maps! bar graphs! frequency charts!)
  • Yale MeSH Analyser – for a side-by-side comparison of MeSH usage

And more! Someday¬†I intend to write up a full list of medline data analysis tools freely available online, but that day is not today…

It’s not necessary to input a full search strategy into most bibliometric analysis programmes… simply paste in your PMIDs!

Why would a person bother to do this?

Building a search strategy is an iterative process and it requires using a lot of different tools. For example, you can use your own common sense and intuition, but other tried-and-true strategies include: backwards/forwards citation chaining, talking to experts in the field, or looking at highly cited papers/journals in the field.

Using quick data analysis strategies throughout the process of building a search strategy will help ensure that important concepts aren’t missed. They provide a more objective picture of what’s happening, what’s missing, and how you can better refine your strategy.

That’s it for this week!

PS This is my first proper blog and I must say… keeping a blog up to date is not as easy as I thought. Please do let me know if you find this content useful and I will try my utmost to keep ’em coming! You can use the site contact form or find me on twitter at @v_woolf.