Ars Technica has an interesting article up entitled Nature Editorial: If you want reproducible science, the software needs to be open source. I’ve written a couple of times about reproducible research and how Emacs and Org Mode make it particularly easy so I was interested in what the article had to say.
The article talks about an editorial in Nature that strongly urges authors of papers to include the source code for the software that supports their papers. I was a bit taken aback because it would never occur to me not to do this. Perhaps it’s just being part of the open source movement or maybe it’s a reaction to the East Anglia CRU fiasco but if you are doing science then you must enable others to reproduce (and thereby validate) your results. My position—and that of most scientists not doing climate research—is that if you don’t make your data and methodology available then you’re not doing science.
With the really great tools that are available today, like Emacs and Org mode, there really is no excuse not to do this. Keeping everything together in a single file would, in most cases, makes the scientists’ research easier and then there is only a single file to submit to the journal. It has the LaTeX source for the paper itself, research notes, data, and software in it. In some cases involving huge data sets or code bases the author may want to add a link to the actual data or code rather than include it but the principle is the same.
The editorial makes a good case that even offering an executable version of the software is not enough. Often the software plays a fundamental roll in the research and should, itself, be examined for errors or other problems. As Irreal readers know to their sorrow, bugs are a part of any non-toy piece of software so it is not a trivial objection to say that an executable version does not suffice.
One of the alternatives to open sourcing the code used by some journals is to have the author provide a “rigorous” natural language description of it in terms of the algorithms and mathematics used. I’m sure that any Irreal reader would find this a non-starter for obvious reasons but the editorial has a large section that discusses and dismisses this as impractical.
The editorial presents and discusses several objections or difficulties to providing the code but I found only one persuasive: concern on the part of the researchers or their organizations that the code represents a marketable commodity and should therefore be withheld. The editorial authors say that’s a tough problem and that perhaps such papers should be marked as not having the necessary items for complete reproducibility. Actually, I’d be fine with saying, “We don’t accept papers that don’t include everything needed for reproducibility.” In the case of publicly funded research, this should not even come up. The public paid for it, the public owns it. Of course, I’m not holding my breath for that to happen.