Editing
Reproducible computational experiments using SCons
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Reproducible research philosophy=== Peer review is the backbone of scientific progress. From the ancient alchemists who worked secretly on magic solutions to insolvable problems, modern science has come a long way to become a social enterprise where the community openly publishes and verifies hypotheses, theories, and experimental results. By reproducing and verifying previously published research, a researcher can take new steps to advance the progress of science. Traditionally, scientific disciplines are divided into theoretical and experimental studies. The reproduction and verification of theoretical results usually require only imagination (apart from pencils and paper), and experimental results are verified in laboratories using equipment and materials similar to those described in the publication. During the last century, computational studies emerged as a new scientific discipline. Computational experiments are carried out on a computer by applying numerical algorithms to digital data. How reproducible are such experiments? On one hand, reproducing the result of a numerical experiment is difficult. The reader needs to have access to precisely the same kind of input data, software, and hardware as the publication's author to reproduce the published result. It is often difficult or impossible to provide detailed specifications for these components. On the other hand, essential computational system components such as operating systems and file formats are getting increasingly standardized. New components can be shared in principle because they represent digital information transferable over the Internet. The practice of software sharing has fueled the miraculously efficient development of Linux, Apache, and many other open-source software projects. Its proponents often refer to this ideology as an analog of the scientific peer review tradition. Eric Raymond, a well-known open-source advocate writes (Raymond, 2004<ref>Raymond, E. S., 2004, The art of UNIX programming: Addison-Wesley.</ref>): <blockquote> Abandoning the habit of secrecy in favor of process transparency and peer review was the crucial step by which alchemy became chemistry. In the same way, it is beginning to appear that open-source development may signal the long-awaited maturation of software development as a discipline. </blockquote> While software development tries to imitate science, computational science must borrow from the open-source model to sustain itself as a fully scientific discipline. In the words of Randy LeVeque, a prominent mathematician (LeVeque, 2006<ref>LeVeque, R. J., to appear, 2006, Wave propagation software, computational science, and reproducible research: Presented at the Proc. International Congress of Mathematicians.</ref>), <blockquote> Within the world of science, computation is now rightly seen as a third vertex of a triangle, complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel [...] Where else in science can one get away with publishing observations that are claimed to prove a theory or illustrate the success of a technique without having to give a careful description of the methods used in sufficient detail that others can attempt to repeat the experiment? [...] Scientific and mathematical journals are filled with pretty pictures these days of computational experiments that the reader has no hope of repeating. Even brilliant and well-intentioned computational scientists often do a poor job of presenting their work in a reproducible manner. The methods are often very vaguely defined, and even if they are carefully defined, they would normally have to be implemented from scratch by the reader in order to test them. </blockquote> In computer science, the concept of publishing and explaining computer programs goes back to the idea of ''literate programming'' promoted by Knuth (1984<ref>Knuth, D. E., 1984, Literate programming: Computer Journal, '''27''', 97--111.</ref>) and expended by many other researchers (Thimbleby, 2003<ref>Thimbleby, H., 2003, Explaining code for publication: Software - Practice & Experience, '''33''', 975--908.</ref>). In his 2004 lecture on "Better Programming," Harold Thimbleby notes<ref>http://www.uclic.ucl.ac.uk/harold/</ref> <blockquote> We want ideas, and in particular programs, that work in one place to work elsewhere. One form of objectivity is that published science must work elsewhere than just in the author's laboratory or even just in the author's imagination; this requirement is called ''reproducibility'' . </blockquote> <!-- The quest for peer review and reproducibility is vital for computational geosciences and computational geophysics in particular. The very first paper published in ''Geophysics'' was titled "Black Magic in Geophysical Prospecting" () and presented an account of different "magical" methods of oil explorations promoted by entrepreneurs in the early days of the geophysical exploration industry. Although none of these methods exist today, it is not a secret that industrial practice is full of nearly magical tricks, often hidden besides a scientific appearance. Only a scrutiny of peer review and result verification can help us distinguish magic from science and advance the latter. --> Nearly ten years ago, the technology of reproducible research in geophysics was pioneered by Jon Claerbout and his students at the Stanford Exploration Project (SEP). SEP's system of reproducible research requires the author of a publication to document the creation of numerical results from the input data and software sources to let others test and verify the reproducibility of the results (Claerbout, 1992a<ref>Claerbout, J., 1992a, Electronic documents give reproducible research a new meaning: 62nd Ann. Internat. Mtg, 601--604, Soc. of Expl. Geophys.</ref>;Schwab et al., 2000<ref>Schwab, M., M. Karrenbach, and J. Claerbout, 2000, Making scientific computations reproducible: Computing in Science & Engineering, '''2''', 61--67.</ref>). The discipline of reproducible research was also adopted and popularized in the statistics and wavelet theory community by Buckheit and Donoho (1995<ref>Buckheit, J. and D. L. Donoho, 1995, Wavelab and reproducible research, ''in'' Wavelets and Statistics, volume '''103''', 55--81. Springer-Verlag.</ref>). It is referenced in several popular wavelet theory books (Hubbard, 1998<ref>Hubbard, B. B., 1998, The world according to wavelets: The story of a mathematical technique in the making: AK Peters.</ref>;Mallat, 1999<ref>Mallat, S., 1999, A wavelet tour of signal processing: Academic Press.</ref>). Pledges for reproducible research appear nowadays in fields as diverse as bioinformatics (Gentleman et al., 2004<ref>Gentleman, R. C., V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Y. Yang, and J. Zhang, 2004, Bioconductor: open software development for computational biology and bioinformatics: Genome Biology, '''5''', R80.</ref>), geoinformatics (Bivand, 2006<ref>Bivand, R., 2006, Implementing spatial data analysis software tools in r: Geographical Analysis, '''38''', 23--40.</ref>), and computational wave propagation (LeVeque, 2006<ref>LeVeque, R. J., to appear, 2006, Wave propagation software, computational science, and reproducible research: Presented at the Proc. International Congress of Mathematicians.</ref>). However, computational scientists' adoption of reproducible research practice has been slow. Partially, this is caused by complicated and inadequate tools.
Summary:
Please note that all contributions to Madagascar are considered to be released under the GNU Free Documentation License 1.3 or later (see
My wiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
English
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Getting Madagascar
download
Installation
GitHub repository
SEGTeX
Introduction
Package overview
Tutorial
Hands-on tour
Reproducible documents
Hall of Fame
User Documentation
List of programs
Common programs
Popular programs
The RSF file format
Reproducibility with SCons
Developer documentation
Adding programs
Contributing programs
API demo: clipping data
API demo: explicit finite differences
Community
Conferences
User mailing list
Developer mailing list
GitHub organization
LinkedIn group
Development blog
Twitter
Slack
Tools
What links here
Related changes
Special pages
Page information