There’s a manifesto that you can sign, calling for a more sensible approach to the use of software in science. It says:
Software is a cornerstone of science. Without software, twenty-first century science would be impossible. Without better software, science cannot progress.
But the culture and institutions of science have not yet adjusted to this reality. We need to reform them to address this challenge, by adopting these five principles:
Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.
Copyright: The copyright ownership and license of any released source code must be clearly stated.
Citation: Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.
Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition.
Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication.
The founding signatories are:
• Nick Barnes and David Jones of the Climate Code Foundation,
• Peter Norvig, the director of research at Google,
• Cameron Neylon of Science in the Open,
• Rufus Pollock of the Open Knowledge Foundation,
• Joseph Jackson of the Open Science Foundation.
I was the 312th person to sign. How about joining?
There’s a longer discussion of each point of the manifesto here. It ties in nicely with the philosophy of the Azimuth Code Project, namely:
Many papers in climate science present results that cannot be reproduced. The authors present a pretty diagram, but don’t explain which software they used to make it, and don’t make this software available, don’t really explain how they did what they did. This needs to change! Scientific results need to be reproducible. Therefore, any software used should be versioned and published alongside any scientific results.
All of this is true for large climate models such as General Circulation Models, as well—but the problem becomes much more serious, because these models have long outgrown the extend where a single developer was able to understand all the code. This is a kind of phase transition in software development: it necessitates a different toolset and a different approach to software development.
As Nick Barnes points out, these ideas
… are simply extensions of the core principle of science: publication. Publication is what distinguishes science from alchemy, and is what has propelled science—and human society—so far and so fast in the last 300 years. The Manifesto is the natural application of this principle to the relatively new, and increasingly important, area of science software.
Thanks, John. I’d add that climate science isn’t any kind of outlier in this: most code, in most fields of science, isn’t released, and even where it is released the other problems addressed by the manifesto still apply: it’s not properly curated or acknowledged.
Most of the GCMs do have available source code (unlike large complex software in many other fields): in climate science the availability problem mainly applies to smaller models, and to the small pieces of analytical code written for individual publications.
And also: we’re not alone in this, and it’s not a case of outsiders interfering in science. Increasing numbers of scientists, across all disciplines, are talking and acting about these problems and other ‘open science’ issues. The Manifesto, like the Panton Principles, is supposed simply to be a banner which can unite many disparate voices.
I feel a bit conflicted as to how useful requiring code to be open will be. When conclusions depend on the code, without the code being reviewed, it’s not satisfactory peer-review. On the other hand, when code is shared, it isn’t rewritten, and confirmations are no longer independent. I think the first trumps the second, but I’m uneasy about how much code is passed around without understanding as it is.
Stoked: I’m signatory #400. :)
I think that releasing code is equally as important as releasing papers. One thing that I found useful for releasing code was Matt Might’s Community Research and Academic Programming License (or CRAPL) – an academic-strength open source license which covers a lot of what the Manifesto contains.
You can find it here: http://matt.might.net/articles/crapl/
The CRAPL has some interesting aspects, but is unfortunately a shrink-wrap contract, not a license (and also is about 10x too verbose for my liking). It might be possible to deliver some of the same value (clause IV in the CRAPL) in a license.
There are two comments on HN, that I agree with (I’m the author of the second):
(from thread http://news.ycombinator.com/item?id=3112274)
tl;dr – “I didn’t see any description of problems on this page, that this manifesto wants to solve” (I see more problems that it creates)
“The code is not as important as descriptions of algorithms, and the ideas behind code”
I would also like to add:
Math will not go anywhere soon – programming languages are getting obsolete much faster – so it’s more important that paper had as much detail as it’s required to replicate the results without code than to have an easy access to code that can degrade quality of the papers e.g. when paper misses some important detail of algorithm, and the code is in some kind of assembly – code works, you can run it and get the same results – lazy researcher would use it, without understanding it – even if he couldn’t code the same algorithm from the paper – raising the chance of replicating bugs.
That’s not how science should work.
I’m thoroughly in favour of code being published, not for its own sake but with a view to allowing replication and criticism. So
I find a very odd thing to write. Weren’t Zosimos of Panopolis’s books publications? I could just about understand the quotation with ‘replication’ in place of ‘publication’.
As is often the case, precision was sacrificed in this simile, for the sake of brevity and rhetorical effect. “Alchemy” is standing in for the hermetic and esoteric traditions often followed in that discipline: a method might never be published, or if published might be enciphered, or described in metaphorical or allegorical ways, or steps might be omitted or misrepresented. These obfuscations were used to prevent replication, or to restrict it to an elite circle of initiates. The effect was that advances were slow and often lost. Some of these traditions died hard in the 17th century, at the birth of modern science: scientists wanted to keep their discoveries to themselves. Henry Oldenburg had to badger people into publication, and (as I recall) on occasion resorted to trickery to achieve it.
Yes, it’s tricky to say what you wanted to say briefly. And as I said I’m very sympathetic to what you are trying to achieve.
A good case of secrecy holding up progress is that of the Renaissance court mathematicians challenging each other to solve specific problems, while keeping their techniques to themselves.
Yeah, we all tend to pick on the poor alchemists. There was a lot of secrecy in some alchemical traditions, though. Isaac Newton is a great example of that!
Here are some of the comments I got on my thread about this over on Google+.
Toby Bartels wrote:
Benjamin Ramage wrote:
Carlos Scheiddeger wrote:
Jane Shevstov wrote:
Benjamin Ramage wrote:
Carlos Scheidegger wrote:
Miguel Angel wrote:
John Baez wrote:
Miguel Angel wrote:
Carlos Scheidegger wrote:
Miguel Angel wrote:
F. Lengyel wrote:
Tim van Beek wrote:
A notable omission of this manifesto related to one of Stallman’s view of software is the necessity of including a document for the procedure of compilation the source code in order to produce the excutable (imagine a 30000 LOC project without a makefile). Moreover equally important (in my view and Stallman’s view) is the ability to run/interpret the code on at least a free (as in beer) environment. If someone has written the software in C++ calling HP/UX system calls is not useful to a researcher that does not have money to buy an HP/UX workstation. A notable example is Darwin’s code that cannot be cross-compiled from another OS to produce a base system (it is not scientific example but It is an example of lack of executability).
The manifesto is principally concerned with publication: that readers should be able at least to *read* the code (because without this, an important aspect of method is not published). Being able to *run* the code opens a subsidiary can of worms, and in particular is not going to be possible in many sciences at present. Many scientists, and some entire disciplines, rely on proprietary – and sometimes very expensive – third party software components. I don’t much like it, but I can’t hope to change it, and as an outsider I can’t even get traction towards changing it (although the discussion document does address this subject).
The manifesto is aimed at things which *can* be changed, today.
Finally, although I personally have a great deal of respect for RMS’s work and achievements – and have been a satisfied user of GCC and emacs for more than 20 years – he’s not any sort of authority in science. Why should scientists care what he thinks?
I’m very tempted to take the opposite view: being able to just read the code isn’t much use, since what happens if I as an author and someone disagree about the correctness of some point: it’ll almost certainly come down to a response I’ve used myself on occasions “It works on my machine, dunno what you think is wrong.” In contrast if I can run the code, even if I can’t understand it I can produce “examples of misbehaviour” that are more difficult to just brush under the carpet.
As you say, actually getting independently compilable code is incredibly difficult (and I’m guilty of not cleaning up some code enough to put it on the Azimuth wiki, so I’m a very black pot here) but I suspect it’s the only thing that will be effective in spotting errors, some of which will have led to bigger “overall picture interpretation” mistakes.
[…] Azimuth points you to the Science Code Manifesto — if you code, go sign it! […]