Eureka Daily, a science blog at The Times newspaper, published a two-part article by Hannah Devlin about freedom of information in science (“FOI: should scientists be exempt?” and “Freedom of information and climate science” – both require a subscription). Part two discusses issues of openness in the context of a recent investigation of research practices of CRU – a climate research group at The University of East Anglia.
Here is an interesting excerpt from the second part:
As Myles Allen, a climate scientist at the University of Oxford points out, in most cases that which is in the public interest will be good for science too. Validation and replication are central to the scientific method. However, points of contention remain about the optimum degree of information sharing. Allen, for instance, suggests that while open access to data is generally desirable, making the computer code used to analyse data available online could have unintended negative consequences. If everyone’s using the same code, who’s going to challenge whether it’s working correctly?
This view is countered by programmer John Graham-Cumming, who found coding errors after trying to reproduce the CRU/Met Office’s CRUTEM and HadCRUT global warming datasets. Working from the raw data released by the Met Office and the description of their process for generating the datasets in a scientific paper he decided to validate their work – a considerable effort that required writing code to implement the algorithm described in the paper. In doing so, he found a problem with the way the error ranges were calculated (amongst other errors), stemming from a bug in their code.
He says: “You could say that by not releasing their buggy code they forced me to find the bug in it by writing my own validation. But actually, if they’d released their code I would have been able to quickly compare the code and the paper and find the bug without the massive effort to write new code. And no one else had actually done this validation (including the Muir Russell review) and as a result the Met Office has been releasing incorrect data for a long time. Perhaps that’s because the validation was so hard in the first place, whereas having code to check would have been easy.”
John Graham-Cumming demonstrated why reproducibility is crucial for computational sciences: it exposes scientific algorithms and workflows to a greater audience, thus preventing critical bugs from going unnoticed.
Reproducibility is an approach to openness in computational sciences. It assumes that not only data but source code (and eveything else needed to reproduce published results) should be released. At the end of the day, it might save one’s scientific credibility from a rather unpleasant public exposé.
Donoho et al. (2009) write “The central motivation for the scientific method … is the *ubiquity of error* the phenomenon that mistakes and self delusion can creep in absolutely anywhere, and that the work of the scientist is primarily about recognizing and rooting out error.”
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.15