Editing Reproducibility (section)

=Economic obstacles and how to overcome them=
Reproducibility has many obvious advantages, so it begs the question: why are so few practitioners of it today? The answer is that its benefits are enormous for the long-term, but most material incentives favor the short term. Getting a junk-food meal in front of the TV/computer is cheaper, takes less time, and is more immediately rewarding than cooking at home a lean steak with salad and exercising to keep in shape. This is commonly framed as an issue of willpower, but sadly, it is often the case that the time or resources (money, education) to take the long-term-health option are not available. This is also the case with the lack of software testing in many instances of regular software development. Since a reproducible paper is essentially a regression test, the end-product of a numerically-intensive piece of research often is a software application, and the issues related to software testing have been analyzed by many others, I will discuss the two together. 

Many purchasers of software do not understand that a large software application is more complex in its behavior than, say, a nuclear power plant. Unlike purchases of common objects/machinery, it should come with automated testing covering all paths through the source code and extensive documentation. Past experience has conditioned purchasers to expect buggy software and poor documentation. Many decision-makers are rewarded for a purchase that visibly minimizes costs now, even if many bugs and a poor interface result in hard-to-quantify productivity losses over the long term. The software market is a [http://en.wikipedia.org/wiki/The_Market_for_Lemons Lemon Market], in which the buyer, not having the ability to distinguish between high and low quality, pays an average price, so higher-quality products which cost more to make are pushed out of the market. Producers of high-quality goods have traditionally fought lemon markets by educating the consumer and seeking independent certifications, such as [http://en.wikipedia.org/wiki/ISO_9000#Summary_of_ISO_9001:2000_in_informal_language ISO 9001] for software development.

Sponsors of research have more ability to distinguish between good and bad research, but the researcher is rewarded proportionally to the number of publications, encouraging hasty, non-reproducible work, even if putting the parameters and data into a reproducibility framework would have taken just a few more hours. There is no extra reward now for the author of a paper that is likely to still be reproducible ten years down the road. Hence most people do not bother with it. Of course, they forfeit all the long-term advantages, including the distinction of calling their work science (If it's not reproducible, then it's not science, sorry). But the bottom line is that rewards are proportional to the number of papers. The ease of technology transfer when a reproducible paper already exists is a good argument for convincing industrial sponsors of research to request reproducibility as a deliverable. In the case of academic and government sponsors, the only way to go is to continuously advocate the long-term benefits of reproducibility, until they change their internal policy of evaluation, giving appropriate weight to reproducible research.

Not only are the extrinsic motivations perversely arraigned, but many scientists themselves, and many programmers even, are not properly educated when it comes to software engineering. Computer programming, like playing the stock market, is prone to a "It is easy to enter the game, hence I have a chance of winning" mentality. Any child can write a simple "Hello, world" programming script, and anybody with a credit card can open a brokerage account. However, from that point to writing a world-class piece of software, or to becoming and staying rich by investing, is a really long way. Most of the pilgrims on this road are self-educated, mostly following the example of those around them, and many have gaping holes in their knowledge that they are not even aware of. A regression test suite is part of the software engineer's toolbox, but how many researchers in numerically-intensive sciences are currently doing regression testing or reproducibility? How many are even using version control with easy visual comparison tools for their code? How many have at least read even one famous software engineering book, and actively work to improve their skills in this field by following online discussions on this topic? The way to promote reproducible research among practitioners is, again, permanent advocacy of its benefits, as well as making a foundation of software engineering knowledge prerequisite for graduate-level training in numerically-intensive sciences.

A fundamental reason for the frontier mentality that still pervades the software world is that for decades software had to keep pace with hardware developments described by Moore's law, and this resulted in a high pace of change of what computer programs did, as well as of APIs of dependencies. By the time a full-fledged test suite would be written, the design and function of the software might have had to change in order to expand to what new hardware would allow it to do. However, the tide has turned. The explosive growth in computer clock speeds has already stopped, growth in hard drive capacity has started to slow down, and the outlet for growth now is the increase in number of CPUs. Should this slowdown persist, there will be more time available for testing and documenting software. This will have consequences in numerically-intensive sciences as well: increases in raw CPU power meant an easy avenue of progress through applying algorithms that up to that time were too expensive to try in practice. Should this go away, practitioners would have to research second-order effects and to focus more on algorithm speeds. Working with smaller improvements means a lower signal/noise ratio. This in turn begs for scientific rigor of experiments through reproducibility, as well as commoditization of codes that does not offer competitive advantage in order to facilitate comparison between experiments.

A final factor stopping reproducibility from thriving are commercial restrictions. Commercial entities sponsor research because they want to derive a competitive advantage from it. They should be encouraged to share that part of work that does not constitute a competitive advantage, in effect commoditizing the platform so they can focus their efforts on that part of the software/research stack that adds the most value. The personnel problems that the oil and gas industry will soon start facing with the retirement of the baby boomer generation may be instrumental in convincing large consumers of software (oil companies) that manpower in the industry as a whole is too scarce to dedicate it to maintaining a large number of competing platforms. The example of other industries (software, banking) can be given in order to show that cooperation on a small number of common platforms so that everybody can focus on the value-added parts is a desirable [http://en.wikipedia.org/wiki/Nash_equilibrium Nash equilibrium]. Already several oil companies open-sourced their platforms (examples). Even in such cases, companies will keep "the good bits" to themselves, and this is understandable. However, should by any event reproducibility become mainstream, they would be compelled to share more in order to be a leader of change rather than shield from it and be left behind by the general advance.