Package overview: Difference between revisions

From Madagascar
Jump to navigation Jump to search
Nick (talk | contribs)
Nick (talk | contribs)
No edit summary
Line 1: Line 1:
Madagascar includes five parts:
==Components==
#A collection of main programs. Most programs act as filters on input data and can be chained in a Unix pipeline. For example: <pre> < data.rsf sfwindow n1=100 | sfbandpass fhi=60 > data2.rsf </pre> This approach follows the Unix philosophy, as formulated by Doug McIlroy, the inventor of Unix pipes (Salus, 1994<ref>Salus, P. H.,  1994, A quarter-century of Unix: Addison-Wesley.</ref>): 
Madagascar provides to the user:
##Write programs that do one thing and do it well. 
# Standalone programs for out-of-core numerical analysis;
##Write programs to work together. 
# Standalone programs for geophysical data processing and imaging;
##Write programs to handle text streams, because that is a universal interface.
# A development kit for C, C++, F77, F90, Matlab and Octave;
#: Running a command (such as <tt>sfwindow</tt>) without parameters or the necessary input and output files shows a brief documentation, explaining the program purpose and parameters.  Alternatively, brief documentation is provided by <tt>sfdoc</tt> program. Main program documentation in HTML format is available [http://www.reproducibility.org/RSF/ on the web]. Madagascar uses ''Regularly Sampled Format''  (RSF) for data files, which is similar to the format used in the SEPlib library developed at the Stanford Exploration Project (SEP). The file format describes regularly sampled hypercubes. Up to 9 dimensions are supported. In accordance with the Unix philosophy, each RSF file (such as <tt>data.rsf</tt>) is a simple readable text. It contains a pointer (<tt>in=</tt> parameter) to the location of the binary data. Madagascar provides programs for conversion to and from other formats such as SEG-Y and SU. Madagascar currently adopts Vplot file format, also developed at SEP, for generated graphics files.
# A framework for [http://en.wikipedia.org/wiki/Regression_testing regression testing] and reproducible numerical experiments, based on [http://www.scons.org/ SCons] and [http://en.wikipedia.org/wiki/LaTeX LaTeX];
#An API (application programmer's interface) for programmers writing their own software to manipulate RSF files. The main software language of the Madagascar package is C. Interfaces to other languages (C++, Fortran-77, Fortran-90, Python, Matlab) are also provided.
# A collection of reproducible scientific articles also used as regression tests for the standalone programs
#A project management system. The system uses and extends [http://www.scons.org/ SCons], an open-source software construction package, to document and maintain data processing flows. Documented projects become computational recipes that can be easily exchanged among Madagascar users.  
#A collection of reproducible documents, organized in living books. Each reproducible book contains a collection of Madagascar recipes (<tt>SConstruct</tt> files) used to generate book figures. The recipes cover a variety of data processing and imaging tasks described in the books. Figures and recipes serve dual purpose with respect to Madagascar maintenance. They provide demos for introducing new users to the functionality of the package and, at the same time, regression tests for assuring the system stability under change.
# A collection of datasets used as input to reproducible numerical experiments / software tests
# A collection of datasets used as input to reproducible numerical experiments / software tests
Much of the documentation requires a preliminary grasp of all parts. This tutorial provides the required overview, with links to more complex descriptions. The new users of Madagascar are encouraged to read through this entire document before following the links.
Madagascar computations imply:
* a data format suitable for large-scale datasets
* a set of executables (Madagascar components) suitable for composing large-scale computations
* an API for developing new components
==Madagascar data format==
Madagascar computations use RSF formatted data. RSF represents regularly sampled
arrays, rectangles, ... hyperrectangles. Irregularly sampled data can be handled as a pair of datasets, one
containing data and the second containing corresponding irregular geometry information.
RSF metadata is treated as "the data"; one of the metadata components is a pointer to the raw binary data,
normally in machine native format. It is possible to append the data to the metadata. RSF metadata is in ASCII format for human readability.
For a detailed explanation, see [[Guide to RSF file format]].


==Standalone programs==
==Standalone programs==
Most programs act as filters on input data and can be chained through Unix pipes, i.e.:
<bash>
< data.rsf sfwindow n1=100 | sfbandpass fhi=60 > data2.rsf
</bash>


Madagascar components may be implemented in multiple languages including Fortran, Matlab, and Python. However,  
This approach follows the Unix philosophy, as formulated by Doug McIlroy, the inventor of Unix pipes (Salus, 1994<ref>Salus, P. H., 1994, A quarter-century of Unix: Addison-Wesley.</ref>): 
the majority of them are implemented in C.
#Write programs that do one thing and do it well. 
 
#Write programs to work together.
Madagascar components take a file name or a list of file names (along with key-value pairs specific to the program) as command line input. If stdin is piped, it is treated as the input, or as the first of multiple inputs. Madagascar components typically produce RSF headers as output to stdout.
#Write programs to handle text streams, because that is a universal interface.  


Madagascar components are self-documenting. When invoked without any command line inputs they output their own
Documentation has designed in a layered fashion. Following the Unix convention, programs have brief <tt>man</tt> pages, which explain the program purpose and parameters. These pages can also be accessed by running a program without any arguments, or [http://www.reproducibility.org/RSF/ online]. A more detailed level of documentation, with basic equations, figures and examples, is provided through the wiki in the [Guide to madagascar programs | Guide to programs]. The [[Task-centric program list]] categorizes and briefly describes all components. At an even higher level of details, the programs can be seen in actual use in the [Reproducible Documents]. Search capabilities are currently provided by the <tt>sfdoc</tt> utility.  
manual page.


Further information about components can be found in
==Madagascar data formats==
* [http://www.reproducibility.org/RSF List of Madagascar componets], also in $RSFROOT/doc/
* [[Guide to madagascar programs]] describes the most commonly used components in detail
* [[Task-centric program list]] categorizes and briefly describes all components


==Madagascar API==
For data, Madagascar uses the [Guide to RSF file format | Regularly Sampled Format] (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a [http://en.wikipedia.org/wiki/Voxel voxel volume].  The format (actually a [http://en.wikipedia.org/wiki/Meta meta]format) makes use of a ASCII file with metadata (information about the data), including a pointer (<tt>in=</tt> parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided.


This introductory document does not cover extending Madagascar. Developers wishing to add Madagascar components are referred to
For graphics, Madagascar currently uses the Vplot vector graphics format. Converters to other graphics formats (Postscript, Gif, JPEG) are also provided.
* [[Guide to madagascar API]]
* [[Guide to programming with madagascar]]
* [[Adding new programs to madagascar]]


==Madagascar Documents==
==Reproducible documents==


Madagascar Documents consist of LaTeX source combined with SCons rules required to fully build the documents. These rules are expressed in terms of SCons extensions that are provided as part of Madagascar.  
A reproducible document consists of LaTeX source combined with SCons rules required to fully build the documents. These rules are expressed in terms of SCons extensions that are provided as part of Madagascar.  


This is the key to the reproducibility aspect of Madagascar. An introduction to reproducible Madagascar documents is at [[Reproducible_computational_experiments_using_SCons]] .
This is the key to the reproducibility aspect of Madagascar. An introduction to reproducible Madagascar documents is at [[Reproducible_computational_experiments_using_SCons]] .


=Madagascar Display: Vplot=
==Madagascar Display: Vplot==


In contrast to most other Madagascar Components, graphics components produce Vplot data as output.
In contrast to most other Madagascar Components, graphics components produce Vplot data as output.


Vplot is a device independent graphics format that allows both vector and raster elements (as such,  
Vplot is a device independent graphics format that allows both vector and raster elements (as such,  
it is comparable to Postscript). Vplot files are interpreted by a number of output devices. In typical usage is for a visual display in X-window.  
it is comparable to Postscript). Vplot files are interpreted by a number of output devices. Its typical usage is for a visual display in X-windows. A list of them is [[Guide to madagascar programs#Plotting programs | provided on the wiki]].
 
Unfortunately, Vplot documentation is partially out of date. The closest thing to a manual is  
[http://sepwww.stanford.edu/theses/sep60/60_25.pdf here]; it describes an older version.
 
Fortunately, the beginning user does not need to know the details of Vplot. A wide range of Madagascar graphics components
are available. These are typically used at the output of a chain of pipes. See [[Guide to madagascar programs#Plotting programs | Guide to madagascar programs]] for a list of these modules.
 
==Example==


Here is an example of a Madagascar pipe. In this case it takes a subsection of a file, low-pass  
Here is an example of a Madagascar pipe. In this case it takes a subsection of a file, low-pass  
Line 84: Line 53:
</bash>
</bash>


More extensive examples are seen at [[Guide to madagascar programs]] . The novice reader should probably read the material  
More extensive examples are seen at [[Guide to madagascar programs]] . The novice reader should probably read the material below before proceeding to that page.
below before proceeding to that page.


=Madagascar Reproducibility and Project Management=  
==Reproducibility and Project Management==


Madagascar uses and extends [http://www.scons.org/ SCons], an open-source  
Madagascar uses and extends [http://www.scons.org/ SCons], an open-source  
Line 145: Line 113:
</python>
</python>


=Getting Madagascar=
==Getting Madagascar==


Madagascar runs on Unix/Linux platforms, including MacOS X and Unix emulations under Miscrosoft Windows. Its installation requires, at a minimum, a working C compiler and Python. Most users will also want an X-Window system on their desktop. See [[Download|download]] and  
Madagascar runs on Unix/Linux platforms, including MacOS X and Unix emulations under Miscrosoft Windows. Its installation requires, at a minimum, a working C compiler and Python. Most users will also want an X-Window system on their desktop. See [[Download|download]] and  
Line 157: Line 125:
its open-source status. Users are encourages to submit their modifications back to the original distribution  
its open-source status. Users are encourages to submit their modifications back to the original distribution  
to the benefit of the whole user community.
to the benefit of the whole user community.
=About Madagascar=


==Why the Name "Madagascar"?==
==Why the Name "Madagascar"?==
Line 173: Line 139:
Your participation is welcome.
Your participation is welcome.


==Alternatives==  
==History==  


In the present form, the Madagascar package, while being completely written from scratch, borrows ideas from the design of [http://sepwww.stanford.edu/doku.php?id=sep:software:seplib SEPlib], a publicly available software package, maintained by Bob Clapp at the Stanford Exploration Project. Generations of SEP students and researchers contributed to SEPlib. Most important contributions came from Rob Clayton, Jon Claerbout, Dave Hale, Stew Levin, Rick Ottolini, Joe Dellinger, Steve Cole, Dave Nichols, Martin Karrenbach, Biondo Biondi, and Bob Clapp.
In the present form, the Madagascar package, while being completely written from scratch, borrows ideas from the design of [http://sepwww.stanford.edu/doku.php?id=sep:software:seplib SEPlib], a publicly available software package, maintained by Bob Clapp at the Stanford Exploration Project. Generations of SEP students and researchers contributed to SEPlib. Most important contributions came from Rob Clayton, Jon Claerbout, Dave Hale, Stew Levin, Rick Ottolini, Joe Dellinger, Steve Cole, Dave Nichols, Martin Karrenbach, Biondo Biondi, and Bob Clapp.


Madagascar also borrows ideas from [http://timna.mines.edu/cwpcodes/ Seismic Unix] (SU), a package maintained by John Stockwell at the Center for Wave Phenomenon at the Colorado School of Mines (Stockwell, 1997<ref>Stockwell, J. W.,  1997, Free software in education: A case study of CWP/SU: Seismic Unix: The Leading Edge, '''16''', 1045--1049.</ref>;Stockwell, 1999<ref>--------, 1999, The CWP/SU: Seismic Un*x package: Computers and  Geosciences, '''25''', 415--419.</ref>). Main contributors to SU included Einar Kjartansson, Shuki Ronen, Jack Cohen, Chris Liner, Dave Hale, and John Stockwell. SU is open-source software (distributed with BSD-style license) starting with release 40 (April 10, 2007).
Madagascar also borrows ideas from [http://timna.mines.edu/cwpcodes/ Seismic Unix] (SU), a package maintained by John Stockwell at the Center for Wave Phenomenon at the Colorado School of Mines (Stockwell, 1997<ref>Stockwell, J. W.,  1997, Free software in education: A case study of CWP/SU: Seismic Unix: The Leading Edge, '''16''', 1045--1049.</ref>;Stockwell, 1999<ref>--------, 1999, The CWP/SU: Seismic Un*x package: Computers and  Geosciences, '''25''', 415--419.</ref>). Main contributors to SU included Einar Kjartansson, Shuki Ronen, Jack Cohen, Chris Liner, Dave Hale, and John Stockwell. SU is open-source software (distributed with BSD-style license) starting with release 40 (April 10, 2007).
For a comprehensive list, see [[other open-source geophysical packages]].


==References==
==References==
<references/>
<references/>

Revision as of 11:27, 9 January 2009

Components

Madagascar provides to the user:

  1. Standalone programs for out-of-core numerical analysis;
  2. Standalone programs for geophysical data processing and imaging;
  3. A development kit for C, C++, F77, F90, Matlab and Octave;
  4. A framework for regression testing and reproducible numerical experiments, based on SCons and LaTeX;
  5. A collection of reproducible scientific articles also used as regression tests for the standalone programs
  6. A collection of datasets used as input to reproducible numerical experiments / software tests

Standalone programs

Most programs act as filters on input data and can be chained through Unix pipes, i.e.: <bash> < data.rsf sfwindow n1=100 | sfbandpass fhi=60 > data2.rsf </bash>

This approach follows the Unix philosophy, as formulated by Doug McIlroy, the inventor of Unix pipes (Salus, 1994[1]):

  1. Write programs that do one thing and do it well.
  2. Write programs to work together.
  3. Write programs to handle text streams, because that is a universal interface.

Documentation has designed in a layered fashion. Following the Unix convention, programs have brief man pages, which explain the program purpose and parameters. These pages can also be accessed by running a program without any arguments, or online. A more detailed level of documentation, with basic equations, figures and examples, is provided through the wiki in the [Guide to madagascar programs | Guide to programs]. The Task-centric program list categorizes and briefly describes all components. At an even higher level of details, the programs can be seen in actual use in the [Reproducible Documents]. Search capabilities are currently provided by the sfdoc utility.

Madagascar data formats

For data, Madagascar uses the [Guide to RSF file format | Regularly Sampled Format] (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided.

For graphics, Madagascar currently uses the Vplot vector graphics format. Converters to other graphics formats (Postscript, Gif, JPEG) are also provided.

Reproducible documents

A reproducible document consists of LaTeX source combined with SCons rules required to fully build the documents. These rules are expressed in terms of SCons extensions that are provided as part of Madagascar.

This is the key to the reproducibility aspect of Madagascar. An introduction to reproducible Madagascar documents is at Reproducible_computational_experiments_using_SCons .

Madagascar Display: Vplot

In contrast to most other Madagascar Components, graphics components produce Vplot data as output.

Vplot is a device independent graphics format that allows both vector and raster elements (as such, it is comparable to Postscript). Vplot files are interpreted by a number of output devices. Its typical usage is for a visual display in X-windows. A list of them is provided on the wiki.

Here is an example of a Madagascar pipe. In this case it takes a subsection of a file, low-pass filters it, and saves the result

<bash> < data.rsf sfwindow n1=100 | sfbandpass fhi=60 > data2.rsf </bash>

In this more elaborate case, the final output is passed to a graphics program and plotted.

<bash> < data.rsf sfwindow n1=100 | sfbandpass fhi=60 | sfcontour | xtpen </bash>

More extensive examples are seen at Guide to madagascar programs . The novice reader should probably read the material below before proceeding to that page.

Reproducibility and Project Management

Madagascar uses and extends SCons, an open-source software construction package, to document and maintain data processing flows. Documented projects become computational recipes that can be easily exchanged among Madagascar users.

SCons is a rule-based package in Python typically used as a build system analogous to make. Familiarity with any build system will be helpful in understanding SCons. SCons statements, as python statements, are invoked in the sequence they are written, but as such they only define rules. The rules are invoked in accordance with a dependency graph which SCons builds based on those rules. Components regarded as "up-to-date" are not rebuilt.

SCons allows user-contributed Builders (meta-rule categories) and Madagascar uses this capability extensively. The idea is that building an output file based on a workflow chain is very much analogous to building a software package based on a software tool chain. The calculation is seen simply as a build with dependencies. This is a considerable benefit in developing alternative workflows using a given dataset. The system maintains an awareness of already completed calculations. Without user intervention, redundant calculations are avoided.

Madagascar calculations are thus expressed as SCons scripts (SConstruct files). SCons extensions follow SCons conventions in beginning with an uppercase letter. The most common Madagascar extensions are Flow(), Result(), and End(). A Flow() invocation wraps Madagascar computational components. Result() is a version of Flow() with a graphical output. Finally an End() actually invokes the default rules for multiple results.

Finally, Madagascar enables a collection of reproducible documents, organized in living books. Each reproducible book contains a collection of Madagascar recipes (SConstruct files) used to generate book figures. The recipes cover a variety of data processing and imaging tasks described in the books. Figures and recipes serve dual purpose with respect to Madagascar maintenance. They provide demos for introducing new users to the functionality of the package and, at the same time, regression tests for assuring the system stability under change.

How it All Comes Together

Here is an example code, described in detail on the SCons page.

<python> from rsfproj import *

  1. Download the input data file

Fetch('lena.img','imgs')

  1. Create RSF header

Flow('lena.hdr','lena.img',

    'echo n1=512 n2=513 in=$SOURCE data_format=native_uchar',
    stdin=0)

  1. Convert to floating point and window out first trace

Flow('lena','lena.hdr','dd type=float | window f2=1')

  1. Display

Result('lena',

      
      sfgrey title="Hello, World!" transp=n color=b bias=128
      clip=100 screenratio=1 
      )

  1. Wrap up

End() </python>

Getting Madagascar

Madagascar runs on Unix/Linux platforms, including MacOS X and Unix emulations under Miscrosoft Windows. Its installation requires, at a minimum, a working C compiler and Python. Most users will also want an X-Window system on their desktop. See download and installation instructions.

License

The Madagascar package is released in an open-source form under the standard GNU GPL license. In simple words, there are no restrictions on the use of the software (including copying, modifying, selling, etc.) However, there are restrictions on the software redistribution intended to prevent the package from losing its open-source status. Users are encourages to submit their modifications back to the original distribution to the benefit of the whole user community.

Why the Name "Madagascar"?

Whimsy, really. It seems easier to remember than the previous name "RSF", and it provides us lots of interesting mascots.

Madagascar Community

Madagascar seeks to become an open and active open source community. Active mailing lists are maintained and annual meetings take place. See

Your participation is welcome.

History

In the present form, the Madagascar package, while being completely written from scratch, borrows ideas from the design of SEPlib, a publicly available software package, maintained by Bob Clapp at the Stanford Exploration Project. Generations of SEP students and researchers contributed to SEPlib. Most important contributions came from Rob Clayton, Jon Claerbout, Dave Hale, Stew Levin, Rick Ottolini, Joe Dellinger, Steve Cole, Dave Nichols, Martin Karrenbach, Biondo Biondi, and Bob Clapp.

Madagascar also borrows ideas from Seismic Unix (SU), a package maintained by John Stockwell at the Center for Wave Phenomenon at the Colorado School of Mines (Stockwell, 1997[2];Stockwell, 1999[3]). Main contributors to SU included Einar Kjartansson, Shuki Ronen, Jack Cohen, Chris Liner, Dave Hale, and John Stockwell. SU is open-source software (distributed with BSD-style license) starting with release 40 (April 10, 2007).

References

  1. Salus, P. H., 1994, A quarter-century of Unix: Addison-Wesley.
  2. Stockwell, J. W., 1997, Free software in education: A case study of CWP/SU: Seismic Unix: The Leading Edge, 16, 1045--1049.
  3. --------, 1999, The CWP/SU: Seismic Un*x package: Computers and Geosciences, 25, 415--419.