Ten Rules for Reproducible Research

Those of you have been around for a while know I’m a big fan of reproducible research. I’ve written about it here, here, here, here, here, and here. Now Anton Nekrutenko, James Taylor, and Eivind Hovig have a nice article in the Computational Biology section of PLOS entitled Ten Simple Rules for Reproducible Computational Research.

Some of the rules are obvious such as

  • Version control all custom scripts
  • Always store the raw data behind plots

while others are things you might not think of such as

  • For analyses that use random input, always record the random number generator seed so that the exact run can be reproduced
  • Generate a Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected

The nice thing about the article is that they discuss each rule and offer ways to follow it. If you’re involved in publishing your research (or even if you’re doing research that won’t be published) you should take a look at this article. It will make your life a lot easier, especially since many journals are now requiring some sort of reproducibility documentation.

The article doesn’t mention Emacs but as I’ve written many times, Emacs and especially Org mode provide many tools for doing reproducible research effectively. If you’re not familiar with that, the posts I linked above will give you a few ideas.

Thanks to Jorge Tavares via Jean-Philippe Paradis for the pointer to the article.

This entry was posted in General and tagged , . Bookmark the permalink.