Editing Reproducible computational experiments using SCons

<center><font size="-1">''This page was created from the LaTeX source in [http://sourceforge.net/p/rsf/code/HEAD/tree/trunk/book/rsf/scons/paper.tex book/rsf/scons/paper.tex] using [[latex2wiki]]''</font></center>

SCons (from Software Construction) is a well-known open-source
program designed primarily for building software. This paper describes our method of extending SCons for managing data processing
flows and reproducible computational experiments. We demonstrate our
usage of SCons with a couple of simple examples.

==Introduction==
This paper introduces an environment for reproducible computational
experiments developed as part of the "Madagascar" software package.
To reproduce the example experiments in this paper, you can download
Madagascar from https://www.ahay.org . At the moment, the
main Madagascar interface is the Unix shell command line so that you
will need a Unix/POSIX system (Linux, Mac OS X, Solaris, etc.) or Unix
emulation under Windows (Cygwin, SFU, etc.)
Our focus, however, is not only on particular tools we use in our research but also on the general philosophy of
reproducible computations.
===Reproducible research philosophy===
Peer review is the backbone of scientific progress. From the ancient
alchemists who worked secretly on magic solutions to insolvable
problems, modern science has come a long way to become a social
enterprise where the community openly publishes and verifies hypotheses, theories, and experimental results. By reproducing and
verifying previously published research, a researcher can take new
steps to advance the progress of science.
Traditionally, scientific disciplines are divided into theoretical and
experimental studies. The reproduction and verification of theoretical
results usually require only imagination (apart from pencils and
paper), and experimental results are verified in laboratories using
equipment and materials similar to those described in the publication.
During the last century, computational studies emerged as a new
scientific discipline. Computational experiments are carried out on a
computer by applying numerical algorithms to digital data. How
reproducible are such experiments? On one hand, reproducing the result
of a numerical experiment is difficult. The reader needs
to have access to precisely the same kind of input data, software, and
hardware as the publication's author to reproduce the
published result. It is often difficult or impossible to provide
detailed specifications for these components. On the other hand, essential
computational system components such as operating systems and
file formats are getting increasingly standardized. New components
can be shared in principle because they represent digital
information transferable over the Internet.
The practice of software sharing has fueled the miraculously efficient
development of Linux, Apache, and many other open-source software
projects. Its proponents often refer to this ideology as an analog of
the scientific peer review tradition. Eric Raymond, a well-known
open-source advocate writes (Raymond, 2004<ref>Raymond, E. S.,  2004, The art of UNIX programming: Addison-Wesley.</ref>):
<blockquote>
Abandoning the habit of secrecy in favor of process transparency and
peer review was the crucial step by which alchemy became chemistry.
In the same way, it is beginning to appear that open-source
development may signal the long-awaited maturation of software
development as a discipline.
</blockquote>
While software development tries to imitate science, computational
science must borrow from the open-source model to sustain
itself as a fully scientific discipline. In the words of Randy LeVeque, a
prominent mathematician (LeVeque, 2006<ref>LeVeque, R. J.,  to appear, 2006, Wave propagation software, computational science, and reproducible research: Presented at the Proc. International  Congress of Mathematicians.</ref>),
<blockquote>
Within the world of science, computation is now rightly seen as a
third vertex of a triangle, complementing experiment and
theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel
[...]  Where else in science can one get away with publishing
observations that are claimed to prove a theory or illustrate the
success of a technique without having to give a careful description of
the methods used in sufficient detail that others can attempt to
repeat the experiment? [...]  Scientific and mathematical journals are
filled with pretty pictures these days of computational experiments
that the reader has no hope of repeating. Even brilliant and well-intentioned computational scientists often do a poor job of presenting
their work in a reproducible manner. The methods are often very
vaguely defined, and even if they are carefully defined, they would
normally have to be implemented from scratch by the reader in order to
test them.
</blockquote>
In computer science, the concept of publishing and explaining computer programs goes back to the idea of ''literate programming''  promoted
by Knuth (1984<ref>Knuth, D. E.,  1984, Literate programming: Computer Journal, '''27''', 97--111.</ref>) and expended by many other researchers
(Thimbleby, 2003<ref>Thimbleby, H.,  2003, Explaining code for publication: Software - Practice &  Experience, '''33''', 975--908.</ref>). In his 2004 lecture on "Better Programming,"
Harold Thimbleby notes<ref>http://www.uclic.ucl.ac.uk/harold/</ref>
<blockquote>
We want ideas, and in particular programs, that work in one place to
work elsewhere. One form of objectivity is that published science
must work elsewhere than just in the author's laboratory or even
just in the author's imagination; this requirement is called
''reproducibility'' .
</blockquote>
<!-- 
The quest for peer review and reproducibility is vital
for computational geosciences and computational geophysics in
particular. The very first paper published in ''Geophysics''  was
titled "Black Magic in Geophysical Prospecting"
() and presented an account
of different "magical" methods of oil explorations promoted by
entrepreneurs in the early days of the geophysical exploration industry.
Although none of these methods exist today, it is not a secret that
industrial practice is full of nearly magical tricks, often hidden
besides a scientific appearance. Only a scrutiny of peer review and
result verification can help us distinguish magic from science and
advance the latter.
 -->
Nearly ten years ago, the technology of reproducible research in
geophysics was pioneered by Jon Claerbout and his students at the
Stanford Exploration Project (SEP). SEP's system of reproducible
research requires the author of a publication to document the creation of
numerical results from the input data and software sources to let
others test and verify the reproducibility of the results
(Claerbout, 1992a<ref>Claerbout, J.,  1992a, Electronic documents give reproducible research a new meaning: 62nd Ann. Internat. Mtg, 601--604, Soc. of Expl. Geophys.</ref>;Schwab et al., 2000<ref>Schwab, M., M. Karrenbach, and J. Claerbout,  2000, Making scientific computations reproducible: Computing in Science & Engineering, '''2''',  61--67.</ref>).

The discipline of reproducible research was also adopted and
popularized in the statistics and wavelet theory community by
Buckheit and Donoho (1995<ref>Buckheit, J. and D. L. Donoho,  1995, Wavelab and reproducible research, ''in'' Wavelets and Statistics, volume '''103''',  55--81. Springer-Verlag.</ref>). It is referenced in several popular wavelet theory
books (Hubbard, 1998<ref>Hubbard, B. B.,  1998, The world according to wavelets: The story of a mathematical technique in the making: AK Peters.</ref>;Mallat, 1999<ref>Mallat, S.,  1999, A wavelet tour of signal processing: Academic Press.</ref>). Pledges for reproducible research
appear nowadays in fields as diverse as 
bioinformatics
(Gentleman et al., 2004<ref>Gentleman, R. C., V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit,  B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S.  Iacus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. Rossini, G.  Sawitzki, C. Smith, G. Smyth, L. Tierney, J. Y. Yang, and J. Zhang,  2004,  Bioconductor: open software development for computational biology and bioinformatics: Genome Biology, '''5''', R80.</ref>), 
geoinformatics (Bivand, 2006<ref>Bivand, R.,  2006, Implementing spatial data analysis software tools in r:  Geographical Analysis, '''38''', 23--40.</ref>), and computational wave propagation (LeVeque, 2006<ref>LeVeque, R. J.,  to appear, 2006, Wave propagation software, computational science, and reproducible research: Presented at the Proc. International  Congress of Mathematicians.</ref>). However, computational scientists' adoption of reproducible research practice has been slow.
Partially, this is caused by complicated and inadequate tools.

===Tools for reproducible research===
The reproducible research system developed at Stanford is based on
"make" (Stallman et al., 2004<ref>Stallman, R. M., R. McGrath, and P. D. Smith,  2004, GNU make: A program for directing recompilation: GNU Press.</ref>), a Unix software construction utility.
Initially, SEP used "cake," a dialect of "make"
(Nichols and Cole, 1989<ref>Nichols, D. and S. Cole,  1989, Device independent software installation with  CAKE, ''in'' SEP-61,  341--344. Stanford Exploration Project.</ref>;Claerbout and Nichols, 1990<ref>Claerbout, J. F. and D. Nichols,  1990, Why active documents need cake, ''in'' SEP-67,  145--148. Stanford Exploration Project.</ref>;Claerbout, 1992b<ref>-------- 1992b, How to use Cake with interactive documents, ''in'' SEP-73,  451--460. Stanford Exploration Project.</ref>;Claerbout and Karrenbach, 1993<ref>Claerbout, J. F. and M. Karrenbach,  1993, How to use cake with interactive  documents, ''in'' SEP-77,  427--444. Stanford Exploration Project.</ref>).
The system was converted to "GNU make," a more standard dialect, by
Schwab and Schroeder (1995<ref>Schwab, M. and J. Schroeder,  1995, Reproducible research documents using  GNUmake, ''in'' SEP-89,  217--226. Stanford Exploration Project.</ref>). The "make" program keeps track of dependencies between different
components of the system and the software construction targets, which,
in the case of a reproducible research system, turn into figures and
manuscripts. The author specifies the targets and commands for their construction in "makefiles," which serve as databases for
defining source and target dependencies. A dependency-based system
leads to rapid development because when one of the sources changes,
only parts that depend on this source get recomputed. Buckheit and Donoho (1995<ref>Buckheit, J. and D. L. Donoho,  1995, Wavelab and reproducible research, ''in'' Wavelets and Statistics, volume '''103''',  55--81. Springer-Verlag.</ref>)
based their system on MATLAB, a popular integrated development
environment produced by MathWorks (Sigmon and Davis, 2001<ref>Sigmon, K. and T. A. Davis,  2001, MATLAB primer, sixth edition: Chapman &  Hall.</ref>). While MATLAB is an adequate tool for prototyping numerical algorithms, it may not be
sufficient for large-scale computations typical for many applications
in computational geophysics.
"Make" is a handy utility employed by thousands of
software development projects. Unfortunately, it is not
well designed from the perspective of user experience. "Make" employs
an obscure and limited special language (a mixture of Unix shell
and special-purpose commands), which often appears confusing
to inexperienced users. According to Peter van der Linden, a software
expert from Sun Microsystems (van der Linden, 1994<ref>van der Linden, P.,  1994, Expert C programming: Prentice Hall.</ref>),
<blockquote>
"Sendmail" and "make" are two well-known programs that are
pretty widely regarded as originally being debugged into existence.
That's why their command languages are so poorly thought out and
difficult to learn. It's not just you -- everyone finds them
troublesome.
</blockquote>
The inconvenience of the "make" command language is also in its limited
capabilities. The reproducible research system developed by
Schwab et al. (2000<ref>Schwab, M., M. Karrenbach, and J. Claerbout,  2000, Making scientific computations reproducible: Computing in Science & Engineering, '''2''',  61--67.</ref>) includes not only custom "make" rules but also an obscure and hardly portable agglomeration of shell and Perl scripts that extend "make" (Fomel et al., 1997<ref>Fomel, S., M. Schwab, and J. Schroeder,  1997, Empowering SEP's documents,  ''in'' SEP-94,  339--361. Stanford Exploration Project.</ref>).
Several alternative systems for dependency-checking software
construction have been developed in recent years. One of the most
promising new tools is SCons, enthusiastically endorsed by
Dubois (2003<ref>Dubois, P. F.,  2003, Why Johnny can't build: Computing in Science &  Engineering, '''5''', 83--88.</ref>). The SCons initial design won the Software Carpentry competition sponsored by Los Alamos National Laboratory in 2000 in the category of "a dependency management tool to replace make." Some of the main advantages of SCons are:
  
*SCons configuration files are Python scripts. Python is a modern programming language praised for its readability, elegance, simplicity, and power (Rossum, 2000a<ref>Rossum, G. V.,  2000a, Python reference manual: Iuniverse Inc.</ref>;Rossum, 2000b<ref>-------- 2000b, Python tutorial: Iuniverse Inc.</ref>). Scales and Ecke (2002<ref>Scales, J. A. and H. Ecke,  2002, What programming languages should we teach our undergraduates?: The Leading Edge, '''21''', 260--267.</ref>) recommend Python as the first programming language for geophysics students. 
*SCons offers reliable, automatic, and extensible dependency analysis and creates a global view of all dependencies—no more "make depend," "make clean," or multiple build passes of touching and reordering targets to get all the dependencies. 
*SCons has built-in support for many programming languages and systems, including C, C++, Fortran, Java, and LaTeX. 
*While "make" relies on timestamps to detect file changes (creating numerous problems on platforms with different system clocks), SCons uses a more reliable detection mechanism, employing MD5 signatures by default. It can detect changes not only in files but also in commands used to build them. 
*SCons provides integrated support for parallel builds. 
*SCons provides configuration support analogous to the "autoconf" utility for testing the environment on different platforms. 
*SCons is designed from the ground up as a cross-platform tool. It works equally well on POSIX systems (Linux, Mac OS X, Solaris, etc.) and Windows. 
*The stability of SCons is assured by an incremental development methodology utilizing comprehensive regression tests. 
*SCons is publicly released under a liberal open-source license<ref>As of this writing, SCons is in a beta version of 0.96, approaching the 1.0 official release. See http://www.scons.org/.</ref>.
In this paper, we propose to adopt SCons as a new platform for
reproducible research in scientific computing.

===Paper organization===
To demonstrate our adoption of SCons for reproducible research, we first describe a couple of simple examples of computational
experiments and then show how SCons helps us document our
computational results.
<!-- 
\newpage
==Madagascar open-source code==

Madagascar's homepage is http://rsf.sourceforge.net. Madagascar
source code is proposed in two versions:
[https://sourceforge.net/project/showfiles.php?group_id=162909 stable]
and
[http://rsf.sourceforge.net/wiki/index.php/Svn-url development].
The stable version is a snapshot of Madagascar at a given time. It was
installed on different platforms and tested before being released.
Updates are typically done every few months as opposed to the
development version, which is updated every few hours or days by a
dynamic team of developers. As such, there is no guarantee that the
development version will be fully functional and stable at any given
time. In the remainder of this paper, we assume that you have
successfully installed Madagascar stable version and that you have an
Internet connection\footnote{XXX provide alternate means to download
Lena.img if no Internet connection XXX}.
 -->

==Example experiments==

The main <tt>SConstruct</tt> commands defined in our reproducible research environment are collected in the table.

<center>
{| class="wikitable"
|+Basic methods of an <tt>rsf.proj</tt> object.
|- 
|style="background-color:#ffdead;"| '''<tt>Fetch(data_file,dir[,ftp_server_info])</tt>'''
|-
| A rule to download <tt><math><</math>data_file<math>></math></tt> from a specific directory <tt><math><</math>dir<math>></math></tt> of an FTP server
|-
|style="background-color:#ffdead;"| '''<tt>Flow(target[s],source[s],command[s][,stdin][,stdout])</tt> '''
|-
| A rule to generate <tt><math><</math>target[s]<math>></math></tt> from <tt><math><</math>source[s]<math>></math></tt> using <tt><math><</math>command[s]<math>></math></tt> 
|-
|style="background-color:#ffdead;"| '''<tt>Plot(intermediate_plot[,source],plot_command)</tt>''' or
'''<tt>Plot(intermediate_plot,intermediate_plots,combination)</tt>'''
|-  
| A rule to generate <tt><math><</math>intermediate_plot<math>></math></tt> in the working directory. 
|-
|style="background-color:#ffdead;"| '''<tt>Result(plot[,source],plot_command)</tt>''' or
'''<tt>Result(plot,intermediate_plots,combination)</tt>'''
|- 
| A rule to generate a final <tt><math><</math>plot<math>></math></tt> in the special <tt>Fig</tt> folder of the working directory. 
|-
|style="background-color:#ffdead;"| '''<tt>End()</tt>''' 
|- 
| A rule to collect default targets. 
|}
</center>

These commands are defined in <tt>&#36;PYTHONPATH/rsf/proj.py</tt> where
<tt>RSFROOT</tt> is the environmental variable to the Madagascar
installation directory. The source of this file is in
[http://sourceforge.net/p/rsf/code/HEAD/tree/trunk/framework/rsf/proj.py framework/rsf/proj.py].

===Example 1===


To follow the first example, select a working project directory and
copy the following code
to a file named <tt>SConstruct</tt><ref>The source of this file is also accessible at [http://sourceforge.net/p/rsf/code/HEAD/tree/trunk/book/rsf/scons/easystart/SConstruct $RSFSRC/book/rsf/scons/easystart/SConstruct].</ref>.

<syntaxhighlight lang="python">
from rsf.proj import *

# Download the input data file
Fetch('lena.img','imgs')

# Create RSF header
Flow('lena.hdr','lena.img',
     'echo n1=512 n2=513 in=$SOURCE data_format=native_uchar',
     stdin=0)

# Convert to floating point and window out the first trace
Flow('lena','lena.hdr','dd type=float | window f2=1')

# Display
Result('lena',
       '''
       sfgrey title="Hello, World!" transp=n color=b bias=128
       clip=100 screenratio=1 
       ''')

# Wrap up
End()
</syntaxhighlight>


This is our "hello world" example that illustrates the basic use of
some of the commands presented in Table~(tbl:commands). The plan
for this experiment is to download data from a public data
server, convert it to an appropriate file format, and generate a
figure for publication. But let us look at the
<tt>SConstruct</tt> script and try to decorticate it.


<syntaxhighlight lang="python">
from rsf.proj import *
</syntaxhighlight>


is a standard Python command that loads the Madagascar project
management module <tt>rsf/proj.py</tt> which provides our extension to
SCons.


<syntaxhighlight lang="python">
Fetch('lena.img','imgs')
</syntaxhighlight>


instructs SCons to connect to a public data server (the default server
if no FTP server information is provided) and to fetch the data file
<tt>lena.img</tt> from the <tt>data/imgs</tt> directory. 
<!-- 
Note that
Madagascar expects a <tt>data</tt> folder on top of the specified
directory (i.e.  <tt>imgs</tt>). In the directory where you have your
SConstruct, running <tt>scons lena.img</tt> on the command line will
download the file <tt>lena.img</tt>.  The equivalent command line is
<pre>
bash&#36; wget http://www.ahay.org/data/imgs/lena.img
</pre>
 -->

Try running "<tt>scons lena.img</tt>" on the command line. The successful output should look like
<pre>
bash&#36; scons lena.img
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
retrieve(["lena.img"], [])
scons: done building targets.
</pre>
with the target file <tt>lena.img</tt> appearing in your directory.
In the following examples, we will use <tt>-Q</tt> (quiet) option of
<tt>scons</tt> to suppress the verbose output.
<syntaxhighlight lang="python">
Flow('lena.hdr','lena.img',
     'echo n1=512 n2=513 in=$SOURCE data_format=native_uchar',
     stdin=0)
</syntaxhighlight>


prepares the Madagascar header file <tt>lena.hdr</tt> using the
standard Unix command <tt>echo</tt>. 

<pre>
bash&#36; scons -Q lena.hdr
echo n1=512 n2=513 in=lena.img data_format=native_uchar > lena.hdr
</pre>

Since <tt>echo</tt> does not take a standard input, stdin is set to 0
in the Flow command; otherwise, the first source is the standard input.
Likewise, the first target is the standard output unless otherwise
specified. 


Note that
<tt>lena.img</tt> is referred as <tt>&#36;SOURCE</tt> in the command. This
allows us to change the source file's name without changing the command.
The data format of the <tt>lena.img</tt> image file is <tt>uchar</tt>
(unsigned character), the image consists of 513 traces with 512
samples per trace. Our next step is to convert the image
representation to floating point numbers and to window out the first
trace so that the final image is 512 by 512 square. The two
transformations are conveniently combined into one with the help of a Unix pipe.
<syntaxhighlight lang="python">
Flow('lena','lena.hdr','dd type=float | window f2=1')
</syntaxhighlight>


<pre>
  bash&#36; scons -Q lena
  scons: *** Do not know how to make target `lena'. Stop.
</pre>
What happened? In the absence of the file suffix, the <tt>Flow</tt>
command assumes that the target file suffix is "<tt>.rsf</tt>". Let us try again.
<pre>
scons -Q lena.rsf
< lena.hdr /RSF/bin/sfdd type=float | /RSF/bin/sfwindow f2=1 > lena.rsf
</pre>
Notice that Madagascar modules <tt>sfdd</tt> and <tt>sfwindow</tt> get
substituted for the corresponding short names in the
<tt>SConstruct</tt> file. The file <tt>lena.rsf</tt> is in a regularly
sampled format<ref>See [[Guide to RSF file format]]</ref> and can be examined, for example, with <tt>sfin lena.rsf</tt><ref>See [[Guide_to_madagascar_programs#sfin]].</ref>.
<pre>
bash&#36; sfin lena.rsf
lena.rsf:
    in="/datapath/lena.rsf@"
    esize=4 type=float form=native
    n1=512         d1=1           o1=0
    n2=512         d2=1           o2=1
        262144 elements 1048576 bytes
</pre>
In the last step, we will create a plot file to display the image
on the screen and for including it in the publication.

<syntaxhighlight lang="python">
Result('lena',
       '''
       sfgrey title="Hello, World!" transp=n color=b bias=128
       clip=100 screenratio=1 
       ''')
</syntaxhighlight>


Notice that we broke the long command string into multiple lines by
using Python's triple quote syntax. All the extra white space will be
ignored when the multiple-line string gets translated into the command
line. The <tt>Result</tt> command has special targets associated with
it. Try, for example, "<tt>scons lena.view</tt>" to observe the
figure <tt>Fig/lena.vpl</tt> generated in a specially created
<tt>Fig</tt> directory and displayed on the screen. The output should
look like this figure.

[[Image:lena.png|frame|center|The output of the first numerical experiment.]]

The reproducible script ends with

<syntaxhighlight lang="python">
End()
</syntaxhighlight>

Ready to experiment? Try some of the following:
  
#Run <tt>scons -c</tt>. The <tt>-c</tt> (clean) option tells SCons to remove all default targets (the <tt>Fig/lena.vpl</tt> image file in our case) and also all intermediate targets that it generated.  
<pre> bash&#36; scons -c -Q 
Removed lena.img 
Removed lena.hdr 
Removed lena.rsf 
Removed /datapath/lena.rsf@ 
Removed Fig/lena.vpl 
</pre> Run <tt>scons</tt> again, and the default target will be regenerated. 
<pre>
bash&#36; scons -Q 
retrieve(["lena.img"], []) 
echo n1=512 n2=513 in=lena.img data_format=native_uchar > lena.hdr 
< lena.hdr /RSF/bin/sfdd type=float | /RSF/bin/sfwindow f2=1 > lena.rsf 
< lena.rsf /RSF/bin/sfgrey title="Hello, World!" transp=n color=b  bias=128 clip=100 screenratio=1 > Fig/lena.vpl </pre> 
#Edit your <tt>SConstruct</tt> file and change some of the plotting parameters. For example, change the value of <tt>clip</tt> from <tt>clip=100</tt> to <tt>clip=50</tt>. Run <tt>scons</tt> again and observe that only the last part of the processing flow (precisely, the part affected by the parameter change) is being run: 
<pre> bash&#36; scons -Q view 
< lena.rsf /RSF/bin/sfgrey title="Hello, World!" transp=n color=b  bias=128 clip=50 screenratio=1 > Fig/lena.vpl 
sfpen Fig/lena.vpl 
</pre> SCons is smart enough to recognize that your editing did not affect any of the previous results in the data flow chain! Keeping track of dependencies is the main feature that separates data processing and computational experimenting with SCons from using linear shell scripts. This feature can save you a lot of time for computationally demanding data processing and make your experiments more interactive and enjoyable. 
#A special parameter to SCons (defined in <tt>rsfproj.py</tt>) can time the execution of each step in the processing flow. Try running <tt>scons TIMER=y</tt>. 
#The <tt>rsfproj</tt> module has direct access to the database that stores the parameters of all Madagascar modules. Try running <tt>scons CHECKPAR=y</tt> to see parameter checking enforced before computations\footnote{This feature is new and experimental and may not work correctly yet}. 
The summary of our SCons commands is given in the table.

{| class="wikitable"
|+SCons commands and options defined in <tt>rsfproj</tt>.
|- 
|style="background-color:#ffdead;"| '''<tt>scons <math><</math>file<math>></math></tt>'''
|-
| Generate <tt><math><</math>file<math>></math></tt> (usually requires <tt>.rsf</tt> suffix for <tt>Flow</tt> targets and <tt>.vpl</tt> suffix for <tt>Plot</tt> targets.)
|- 
|style="background-color:#ffdead;"| '''<tt>scons</tt>'''
|-
| Generate default targets (usually figures specified in <tt>Result</tt>.) 
|- 
|style="background-color:#ffdead;"| '''<tt>scons view</tt>''' or '''<tt>scons <math><</math>result<math>></math>.view</tt> '''
|-
| Generate <tt>Result</tt> figures and display them on the screen. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons print</tt>''' or '''<tt>scons <math><</math>result<math>></math>.print</tt>''' 
|-
| Generate <tt>Result</tt> figures and print them. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons lock</tt>''' or '''<tt>scons <math><</math>result<math>></math>.lock</tt> ''' 
|-
| Generate <tt>Result</tt> figures and install them in a separate location. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons test</tt>''' or '''<tt>scons <math><</math>result<math>></math>.test</tt>''' 
|-
| Generate <tt>Result</tt> figures and compare them with the corresponding "locked" figures stored in a separate location (regression testing). 
|- 
|style="background-color:#ffdead;"| '''<tt>scons <math><</math>result<math>></math>.flip</tt>''' 
|-
| Generate the <tt><math><</math>result<math>></math></tt> figure and compare it with the corresponding "locked" figure stored in a separate location by flipping between the two figures on the screen. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons TIMER=y ...</tt> ''' 
|-
| Time the execution of each step in the processing flow (using the Unix <tt>time</tt> utility.) 
|- 
|style="background-color:#ffdead;"| '''<tt>scons CHECKPAR=y ...</tt> ''' 
|-
| Check the names and values of all parameters supplied to Madagascar modules in the processing flow before executing anything (guards against incorrect input.) This option is new and experimental.  
|}

===Example 2===

The plan for this experiment is to add random noise to the test
"Lena" image and then attempt removing it by low-pass filtering
and hard thresholding of coefficients in the Fourier domain. The
resultant images are shown in the figures.

[[Image:panel1.png|frame|center|Top left: original image. Top right: random noise added. Bottom left: original image spectrum in the Fourier (<math>F</math>-<math>X</math>) domain. Bottom right: noisy image spectrum in the Fourier (<math>F</math>-<math>X</math>) domain.]]

[[Image:panel2.png|frame|center|Left: denoising by low-pass filtering.  Right: denoising by hard thresholding in the Fourier domain.]]

Since the <tt>SConstruct|</tt> file is a Python script, we can also use all the
flexibility and power of the Python language in our Madagascar
reproducible scripts. A demo script is available in the
<tt>rsf/scons/rsfpy</tt> subdirectory of the Madagascar <tt>book</tt>
directory. Rather than commenting on it line-by-line, we select some
parts of interest.
In the <tt>SConstruct</tt> script, we can declare
Python variables
<syntaxhighlight lang="python">
bias = 128
</syntaxhighlight>

and use them later, for example, to define our customized plot
command as a Python function
<syntaxhighlight lang="python">
def grey(title,transp='n',bias=bias):
    return '''
    sfgrey title="%s" transp=%s bias=%g clip=100
    screenht=10 screenwd=10 crowd2=0.85 crowd1=0.8
    label1= label2= 
    ''' % (title,transp,bias)
</syntaxhighlight>

This Python function, named <tt>grey()</tt>, can then be called in Plot or Result
commands, e.g.
<syntaxhighlight lang="python">
Plot('lplena',grey('Noisy Lena LP filtered'))
</syntaxhighlight>

We can define a Python dictionary, e.g.
<syntaxhighlight lang="python">
titles = {'lena':'Lena',
          'nlena':'Noisy Lena'}
</syntaxhighlight>
and loop over its entries, e.g.
<syntaxhighlight lang="python">
for name in titles.keys():
    Plot(name,grey(titles[name]) )
    cftitle = titles[name]+' in FX domain'
    Flow('fx'+name,name,'sfspectra')
    Plot('fx'+name,grey(cftitle,'y',100))
</syntaxhighlight>
Note that the title of the plots is obtained by concatenating Python
strings.
Python strings can also be used to define sequences of commands used
in several Flows, e.g.
<syntaxhighlight lang="python">
# 2-D FFT
fft2 = 'sffft1 sym=y | sffft3 sym=y'
Flow('fnlena','nlena',fft2)
</syntaxhighlight>

Finally, in our Madagascar reproducible script, we may want the option
to pass command line arguments when running SCons or use default
values otherwise, e.g.
<syntaxhighlight lang="python">
# denoising using thresholding in the Fourier domain
fthr = float(ARGUMENTS.get('fthr', 70))
Flow('fthrlena','fnlena','sfthr thr=%f mode="hard"' % fthr)
</syntaxhighlight>

Running <tt>scons</tt> only, the default value set for fthr (i.e. 70)
is used whereas running <tt>scons fthr=68</tt> set fthr to a command
line specified value.
This is by no means an exhaustive list of options, but hopefully, it
will give you a flavor of the powerful tool you have in your hands. Enjoy!
<!-- 
===Useful SCons commands for reproducible scripts===
On top of SCons standard options (<tt>scons --help</tt> for more
details), Madagascar has its own SCons options. We already saw
<tt>scons plot.view</tt> that displays <tt>plot.vpl</tt> (in the
<tt>Fig</tt> folder) obtained in a Result command. <tt>scons view</tt>
displays the result plots one after the other.
It is also possible to check the parameters for Madagascar programs in
SCons Flow commands using the CHECKPAR option (\texttt{scons
CHECKPAR=y target}). Note that CHECKPAR is an experimental option
and will be enhanced in the future to include parameter ranges and
other safety checks.
To time the execution of processing flows in a SConstruct, use the
TIMER option (<tt>scons TIMER=y target</tt>).
<tt>scons lock</tt> is used to secure result plots and copy them from
the <tt>Fig</tt> folder of your working directory to your
<tt>&#36;RSFFIGS</tt> folder where <tt>RSFFIGS</tt> is the environmental
variable to the directory where you want Madagascar to put your key
Madagascar result plots. Note that this is a necessary step before
creating reproducible documentation. <tt>scons plot.flip</tt> runs
<tt>xtpen Fig/plot.vpl /locked/figures/plot.vpl</tt> to flip between
the new and locked figure. This is useful when detecting changes.
 -->

==Creating reproducible documentation==

You are done with computational experiments and want to communicate
them in a paper. SCons helps us create high-quality papers where
computational results (figures) are integrated with papers written in
L<sup>A</sup>TEX\. 
The corresponding SCons extension is defined in  <tt>&#36;PYTHONPATH/rsf/tex.py</tt> where
<tt>RSFROOT</tt> is the environmental variable to the Madagascar
installation directory. The source of this file is in
[http://sourceforge.net/p/rsf/code/HEAD/tree/trunk/framework/rsf/tex.py framework/rsf/tex.py].
We summarize the basic methods and commands in the tables.

{| class="wikitable"
|+Basic methods of an <tt>rsf.tex</tt> object.
|- 
|style="background-color:#ffdead;"| '''<tt>Paper(paper_name,[,lclass][,use][,include][,options])</tt>'''
|-
| A rule to compile <tt><math><</math>paper_name<math>></math>.tex</tt> L<sup>A</sup>TEX\ document using the L<sup>A</sup>TEX2e class specified in <tt>lclass</tt> (default is <tt>geophysics.cls</tt> from the [[SEGTeX]] package) with additional options specified in <tt>options</tt>,  additional packages specified in <tt>use</tt>, and additional preamble specified in <tt>include</tt>. 
|- 
|style="background-color:#ffdead;"| '''<tt>End()</tt> '''
|-
| A rule to collect default targets (referring to <tt>paper.tex</tt> document).
|}

{| class="wikitable"
|+SCons commands defined in <tt>rsftex</tt>.
|- 
|style="background-color:#ffdead;"| '''<tt>scons</tt>'''
|-
| Generate the default target (usually the PDF file <tt>paper.pdf</tt> from the source L<sup>A</sup>TEX file <tt>paper.tex</tt>.) 
|- 
|style="background-color:#ffdead;"| '''<tt>scons pdf</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.pdf</tt> '''
|-
| Generate PDF files from L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt>. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons read</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.read</tt> '''
|-
| Generate PDF files from L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt> and display them on the screen. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons print</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.print</tt> '''
|-
| Generate PDF files from L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt> and print them. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons html</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.html</tt> '''
|-
| Generate HTML files from L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt> using L<sup>A</sup>TEXtoHTML. The directory <tt><math><</math>paper_name<math>></math>_html</tt> gets created. 
|- 
|style="background-color:#ffdead;"| '''<tt>scons install</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.install</tt> '''
|-
| Generate PDF and HTML files from L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt> and install them in a separate location (used for publishing on a web site).
|- 
|style="background-color:#ffdead;"| '''<tt>scons wiki</tt>''' or '''<tt>scons <math><</math>paper_name<math>></math>.wiki</tt> '''
|-
| Convert L<sup>A</sup>TEX sources <tt>paper.tex</tt> or <tt><math><</math>paper_name<math>></math>.tex</tt> to the <tt>MediaWiki</tt> format (used for publishing on a Wiki web site). 
|}

<!-- 
A Madagascar reproducible paper is a paper written in L<sup>A</sup>TEX and
whose figures are either generated by Madagascar reproducible scripts
or available for download, e.g., this paper!  (<tt>paper.tex</tt>
available in the <tt>rsf/scons/</tt> directory of Madagascar book
section).

The main SConstruct command set in our reproducible research
environment and related to documentation is

This command is defined in <tt>&#36;PYTHONPATH/rsf/tex.py</tt>.
 -->

===Example===

This paper by itself is an example of a reproducible document. It is
generated using the following <tt>SConstruct</tt> file which is place
in the directory above the projects directories.

<syntaxhighlight lang="python">
from rsf.tex import *
Paper('velan',use='hyperref,listings,color')
End(use='hyperref,listings,color')
</syntaxhighlight>


This <tt>SConstruct</tt> generates this paper, but it can also compile
<tt>velan.tex</tt> in the same directory. Note that there is no
<tt>Paper</tt> command for <tt>paper.tex</tt> since it is the default
documentation name. Optional L<sup>A</sup>TEX packages and style used in
<tt>paper.tex</tt> are passed in the End command.

Let's now take a closer look at <tt>paper.tex</tt> to understand how
the figures of the documentation are linked to the reproducible
scripts that created them. First of all, note that <tt>paper.tex</tt>
is not a regular L<sup>A</sup>TEX document but only its body (no
<math>\backslash</math>documentclass, <math>\backslash</math>usepackage, etc.). In our
paper, the first figure was created in the project folder
<tt>easystart</tt> (sub-folder of our documentation folder) by the
resulting plot <tt>lena.vpl</tt>. In the L<sup>A</sup>TEX source code, it
translates as

<syntaxhighlight lang="latex">
\inputdir{easystart} 
\sideplot{lena}{height=.25\textheight}{The output of the first numerical experiment.}
</syntaxhighlight>

The <math>\backslash</math>inputdir command points to the project directory and
the <math>\backslash</math>sideplot command calls <tt><math><</math>result_name<math>></math></tt>. The
L<sup>A</sup>TEX tag of the figure is <tt>fig:<math><</math>result_name<math>></math></tt>. The
first time the paper is compiled, the result file is automatically
converted to PDF format. 

<!-- 
===Useful SCons commands for reproducible documentation===

To compile this paper, you first need to run and lock the
<tt>easystart</tt> project. Go in the <tt>easystart</tt> folder and
run <tt>scons lock</tt>. Go back to the documentation folder and run
<tt>scons pdf</tt> (alternatively \texttt{scons
  <math><</math>paper_name<math>></math>.pdf}). Use <tt>scons read</tt> (alternatively
<tt>scons <math><</math>paper_name<math>></math>.read</tt>) or your favorite PDF reader to
read this paper reproduced by yourself...
 -->

==References==
<references/>