Reproducible Research with Docker

Karl Voit retweets this

that points to this post from Stacy Konkiel that, in turn, tells us to read this excellent post from Melissa Gymrek on using Docker to do reproducible research.

One of main problems with reproducible research is that even if all the data and computer source code is available, it can be hard to reproduce the environment used by the original researcher. You may have a different version of the operating system, a different compiler, a new version of R or similar tool. As much as we might wish it otherwise, these things can and do affect the results of complex computations.

All of that aside, if you’ve ever tried to understand someone else’s build system, you know how hard it can be to figure out which scripts produce which pieces of the finished product. Things like Org mode can help a lot with that problem but not everyone is an Emacs user and most researchers are more concerned with the actual research than with providing an easy to duplicate environment.

That’s were Docker comes in. The researcher does all his work in a Docker virtual machine, saves away the Docker image, and makes it available to other researchers. Subsequent researchers will then have the same the same build tools and data as the original researcher.

Gymrek has details on how to go about this and I recommend you read her post. There are, as she relates, even repositories analogous to GitHub where you can stash your Docker images so others can get at them. Again, see Gymrek’s post for details. Her post is one of the most useful things I’ve read on reproducible research in some time.

This entry was posted in General and tagged . Bookmark the permalink.