Reproducibility Using GNU Guix

Regular readers know I’m fond of reproducible research and occasionally write about it. Unfortunately, I often write as if reproducible research is simply a matter of firing up Emacs and Org mode. It is, of course, a lot more complicated than that. Using Org mode puts everything in one place and makes it clear what you did and how you did it but that may not be enough for another researcher to duplicate or expand on your work.

The bioinformatics group at the Max Delbrück Center has an interesting preprint available that illustrates some of the problems and their solutions to those problems. The TL;DR is that they use the GNU Guix package manager to be able to precisely rebuild the software environment used for their experiments. This gives them a way of working with a common software stack for subsequent experiments or to allows others to reproduce their work. The goal is to provide a way of producing a bit-identical software base.

They illustrate this by packaging up 4 analysis pipelines used to process data and produce publishable tables and graphs. The abstract of their paper is available at bioRxiv along with a link to the paper itself.

You don’t have to have a background in bioinformatics to read most of the paper. You can skip the descriptions of the pipelines and concentrate on the description of the build system and the problems it addresses.

This entry was posted in General and tagged . Bookmark the permalink.