Manual: Difference between revisions

From Madagascar
Jump to navigation Jump to search
Nick (talk | contribs)
m →‎How to parallelize your programs: wikified 2007-12-27 blog post
Sfomel (talk | contribs)
 
(2 intermediate revisions by 2 users not shown)
Line 15: Line 15:


==Community==
==Community==
A description of the current Madagascar community, with the map of downloads and an estimate of the number of installs, who are the biggest users, outstanding research results obtained with Madagascar, etc. Links to the [http://reproducibility.org/rsflog/ blog], [https://lists.sourceforge.net/lists/listinfo/rsf-user user mailing list], [https://lists.sourceforge.net/lists/listinfo/rsf-devel developer mailing list]. Mention of the Google Groups mirrors for [http://groups.google.com/group/osdeve_mirror_geophysics_rsf-user rsf-user] and [http://groups.google.com/group/osdeve_mirror_geophysics_rsf-devel rsf-devel] Also mention the [http://sourceforge.net/tracker/?group_id=162909&atid=825645 bug tracker] and [http://sourceforge.net/tracker/?group_id=162909&atid=825648 feature request tracker], encouraging the community to use them more. Mention [http://sourceforge.net/forum/?group_id=162909 forums] as an alternative for those who want to ask questions or conduct discussions without subscribing to a mailing list.
A description of the current Madagascar community, with the map of downloads and an estimate of the number of installs, who are the biggest users, outstanding research results obtained with Madagascar, etc. Links to the [http://reproducibility.org/rsflog/ blog], [https://lists.sourceforge.net/lists/listinfo/rsf-user user mailing list], [https://lists.sourceforge.net/lists/listinfo/rsf-devel developer mailing list]. Mention of the Google Groups mirrors for [https://groups.google.com/forum/#!forum/osdeve_mirror_geophysics_rsf-user rsf-user] and [https://groups.google.com/forum/#!forum/osdeve_mirror_geophysics_rsf-devel rsf-devel] Also mention the [http://sourceforge.net/tracker/?group_id=162909&atid=825645 bug tracker] and [http://sourceforge.net/tracker/?group_id=162909&atid=825648 feature request tracker], encouraging the community to use them more. Mention [http://sourceforge.net/forum/?group_id=162909 forums] as an alternative for those who want to ask questions or conduct discussions without subscribing to a mailing list.


==History==
==History==
Line 68: Line 68:
* How to display vplot images with sfpen. How it defaults to <tt>oglpen</tt> on systems that support OpenGL, and <tt>xtpen</tt> on systems that do not support it, but have X Windows.
* How to display vplot images with sfpen. How it defaults to <tt>oglpen</tt> on systems that support OpenGL, and <tt>xtpen</tt> on systems that do not support it, but have X Windows.
* How to convert vplot images to other formats with <tt>vpconvert</tt>. This tool can work on a single plot, i.e.:  
* How to convert vplot images to other formats with <tt>vpconvert</tt>. This tool can work on a single plot, i.e.:  
<bash>
<syntaxhighlight lang="bash">
vpconvert file.vpl file.jpg
vpconvert file.vpl file.jpg
</bash>
</syntaxhighlight>
or an entire collection of files:
or an entire collection of files:
<bash>
<syntaxhighlight lang="bash">
vpconvert format=tiff Fig/*.vpl
vpconvert format=tiff Fig/*.vpl
</bash>
</syntaxhighlight>
The <tt>vpconvert</tt> program can export to and from a multitude of file formats: avi, eps, gif, jpg, mpeg, pdf, png, ppm, ps, svg, and tif, and is the recommended vpl import and export tool. Older single-purpose utilities (vplot2gif and vplot2avi) are still available.
The <tt>vpconvert</tt> program can export to and from a multitude of file formats: avi, eps, gif, jpg, mpeg, pdf, png, ppm, ps, svg, and tif, and is the recommended vpl import and export tool. Older single-purpose utilities (vplot2gif and vplot2avi) are still available.


Line 145: Line 145:
From Jim's 2009-03-7 rsf-devel message:
From Jim's 2009-03-7 rsf-devel message:
Here is a way to generate all the targets in the subdirectories of $RSFSRC/book/geostats/spatial_stats:
Here is a way to generate all the targets in the subdirectories of $RSFSRC/book/geostats/spatial_stats:
<bash>
<syntaxhighlight lang="bash">
cd $RSFSRC/book/geostats/spatial_stats
cd $RSFSRC/book/geostats/spatial_stats
sftour scons
sftour scons
</bash>
</syntaxhighlight>
That works pretty nice.  It generates all the targets in each of the 4 subdirectories of book/geostats/spatial_stats.
That works pretty nice.  It generates all the targets in each of the 4 subdirectories of book/geostats/spatial_stats.


Now suppose I want to capture the output and errors in a log file (tcsh):
Now suppose I want to capture the output and errors in a log file (tcsh):
<bash>
<syntaxhighlight lang="bash">
sftour scons >& scons.log
sftour scons >& scons.log
</bash>
</syntaxhighlight>
That's nice, except all the output goes into one log file in book/geostats/spatial_stats.  Suppose I want 4 separate log files, one in each of the subdirectories.  This will do the trick:
That's nice, except all the output goes into one log file in book/geostats/spatial_stats.  Suppose I want 4 separate log files, one in each of the subdirectories.  This will do the trick:
<bash>
<syntaxhighlight lang="bash">
sftour 'scons >& scons.log'
sftour 'scons >& scons.log'
</bash>
</syntaxhighlight>
The quotes make the entire string go to sftour as the command to run in each directory, instead of just 'scons'.
The quotes make the entire string go to sftour as the command to run in each directory, instead of just 'scons'.


So far so good.  Now suppose I want to go up one directory level and do the process recursively:
So far so good.  Now suppose I want to go up one directory level and do the process recursively:
<bash>
<syntaxhighlight lang="bash">
cd $RSFSRC/book/geostats
cd $RSFSRC/book/geostats
sftour sftour 'scons >& scons.log'
sftour sftour 'scons >& scons.log'
</bash>
</syntaxhighlight>
Well, that runs scons in each of the book/geostats/*/* directories, but it only makes 3 log files in the 3 subdirectories of book/geostats, not 14 log files in the 14 book/geostats/*/* directories.  I can get 14 separate log files like this:
Well, that runs scons in each of the book/geostats/*/* directories, but it only makes 3 log files in the 3 subdirectories of book/geostats, not 14 log files in the 14 book/geostats/*/* directories.  I can get 14 separate log files like this:
<bash>
<syntaxhighlight lang="bash">
sftour "sftour 'scons >& scons.log'"
sftour "sftour 'scons >& scons.log'"
</bash>
</syntaxhighlight>
That does what I want.  Now suppose I want to go up one more level to $RSFSRC/book and run the process recursively three levels deep:
That does what I want.  Now suppose I want to go up one more level to $RSFSRC/book and run the process recursively three levels deep:
<bash>
<syntaxhighlight lang="bash">
sftour sftour "sftour 'scons >& scons.log'"
sftour sftour "sftour 'scons >& scons.log'"
</bash>
</syntaxhighlight>
This works, but it doesn't put the log files in the bottom level, it puts them one level up.  I can't fix it the same way I did before because I've run out of quotes :-)  Here is one way to do it (tcsh):
This works, but it doesn't put the log files in the bottom level, it puts them one level up.  I can't fix it the same way I did before because I've run out of quotes :-)  Here is one way to do it (tcsh):
<pre>
<pre>
Line 187: Line 187:


From Sergey's 2009-03-10 rsf-devel message: A more elegant solution is
From Sergey's 2009-03-10 rsf-devel message: A more elegant solution is
<bash>
<syntaxhighlight lang="bash">
sftour sftour scons >& ../../%/%/scons.log
sftour sftour scons >& ../../%/%/scons.log
</bash>
</syntaxhighlight>


==Writing a LaTeX paper in the Madagascar framework==
==Writing a LaTeX paper in the Madagascar framework==

Latest revision as of 23:00, 8 April 2014

As the number of pages on the Wiki grows, the navbar starts becoming insufficient for proper organization of all documentation. A top-down view of all materials about Madagascar is also useful for determining whether gaps in coverage exist. This page will stay in the Sandbox for a long while -- until all gaps have been filled.

Ideally the manual will only consist of either links to wiki pages, or own content. "Forking", i.e. creating a modified copy of a page especially for the manual, invariably ends up with one version getting out of synch.

About Madagascar[edit]

Introduction[edit]

Main Page of the wiki, and the Package overview.

For people who do not read manuals[edit]

Point them to Download, Short Install Guide, and the SEPlib Tour Revisited.

Why use Madagascar?[edit]

An articulate description of the reasons on the Why Madagascar page. Have some spectacular pictures obtained with algorithms that are not present in other packages. Describe algorithms/tools unavailable in other open-source geophysical data analysis packages.

Community[edit]

A description of the current Madagascar community, with the map of downloads and an estimate of the number of installs, who are the biggest users, outstanding research results obtained with Madagascar, etc. Links to the blog, user mailing list, developer mailing list. Mention of the Google Groups mirrors for rsf-user and rsf-devel Also mention the bug tracker and feature request tracker, encouraging the community to use them more. Mention forums as an alternative for those who want to ask questions or conduct discussions without subscribing to a mailing list.

History[edit]

A history of Madagascar, with the SEPlib/SU part of the "Alternatives" section of the Introduction, and mentions of landmark events (short descriptions where necessary):

A link to the appendix containing the content of the Conferences and Publications pages.

Downloading and installing Madagascar[edit]

Download[edit]

Download

Install[edit]

  1. Short installation guide
  2. Advanced installation guide

Licensing and export regulations[edit]

Using Madagascar[edit]

The lightning-quick tour[edit]

The revisited SEP Tour

The Madagascar file formats[edit]

The Regularly Sampled Format (RSF)[edit]

The current Guide to RSF file format

Handling irregularly sampled data[edit]

Explain the principle of the current method (sfheadermath/sfheaderwindow used on the trace header block output by su/segyread)

Importing and exporting data from and to SEG-Y and SU[edit]

sfsegyread and sfsuread, with examples

Importing and exporting data from and to raster images[edit]

Raster image I/O

Visualizing data and exporting figures with vplot[edit]

  • Explanation of vplot format
  • Preempting display aliasing in raster plots with sfprep4plot
  • How to create vplot images with sfgraph, sfgrey, sfdots, etc. Common pen parameters
  • How to display vplot images with sfpen. How it defaults to oglpen on systems that support OpenGL, and xtpen on systems that do not support it, but have X Windows.
  • How to convert vplot images to other formats with vpconvert. This tool can work on a single plot, i.e.:
vpconvert file.vpl file.jpg

or an entire collection of files:

vpconvert format=tiff Fig/*.vpl

The vpconvert program can export to and from a multitude of file formats: avi, eps, gif, jpg, mpeg, pdf, png, ppm, ps, svg, and tif, and is the recommended vpl import and export tool. Older single-purpose utilities (vplot2gif and vplot2avi) are still available.

Visualizing data and exporting figures with GLE[edit]

Graphics with GLE

Visualizing data and exporting figures with gnuplot[edit]

Graphics with gnuplot

Visualizing data and exporting figures with PLplot[edit]

PLplot is a device-independent vector-plotting library. Their concept is very similar to that of vplot, but instead of separated device-dependent pens (like xtpen or pspen) they use loadable "drivers" (organized as shared objects and connected to a plotting programs in a plugin fashion). They have an extensive high-level interface for different types of plots.

A sample Madagascar program which utilizes PLplot's surface rendering capabilities is sfplsurf.

Calling existing Madagascar programs[edit]

Finding out what program you need[edit]

  1. sfdoc -k
  2. Task-centric program list and all its subordinate nodes
  3. Collection of 2-3 page reproducible papers -- "How to do raytracing in Madagascar"; "How to do modeling in Madagascar"; etc
  4. SU to m8r dictionary
  5. SEPlib to m8r dictionary
  6. Other such dictionaries, for free or proprietary seismic processing packages. Such dictionaries are also useful because they will highlight algorithms/utilities present in such packages but missing from m8r.

This chapter is now just a sketch, should get quite big. Users approach tools in a task-centric fashion, i.e. Q1:"how do I do X with Madagascar?", A1:"With feature Y"; Q2: "How do I use feature Y to this end?" M8r is very good at answering Q2, but people ask Q1 first. Many of the reproducible papers included so far contain cutting-edge research. Users learning how to use Madagascar need to start with something much more simple, where they do not have to focus on understanding research on top of understanding software.

Learning how to use a given program[edit]

  1. Command-line self-doc
  2. Local html self-doc ($RSFROOT/doc/index.html). Contains all programs installed on the user's machine and only those programs.
  3. Online self-doc
  4. The wiki Guide to Programs.
  5. Series of dedicated reproducible papers that present the theory behind specific geophysical programs and demonstrate it with various types of inputs and combination of parameters, like this paper does for SEPlib's AMO program.
  6. Combining together multiple programs -- the reproducible papers; pointer to relevant section of the manual ("Exploring reproducible papers")

What is reproducibility[edit]

The whole Reproducibility page, combined with Section 1 from Reproducible computational experiments using SCons

Exploring existing reproducible papers[edit]

Papers and books included in the Madagascar distribution[edit]

Reproducible Documents and more.

How to reproduce specific figures in existing papers[edit]

A frequently encountered case is when a researcher wants to reproduce only one or several figures from an entire paper, but not the entire paper. This can happen because on that system LaTeX dependencies of Madagascar are missing or not working properly, or simply because the researcher is interested only in that result.

  1. Finding the paper directory: If the interesting article has been found by browsing/hyperlink to Reproducible Documents, then the reproducibility package corresponding to
    http://www.reproducibility.org/RSF/book/<bookname>/<papername>/paper_html/
    can be found in
    RSFSRC/book/<bookname>/<papername>/
  2. Finding result names: Use the html version of the paper, or grep in all .tex files in the directory for a text string that occurs in the figure legend. Multiple-panel figures may have individual names for each panel. [Note: In pdf versions obtained with scons pdf in paper directory, neither the book name nor paper directory name nor figure names are given. LaTeX options to have figure names as well as a Geophysics-style header/footer with more details on the first page may be in order]
  3. Finding where to launch the re-build: In some cases, rules for creating a result are specified in SConstruct files in subdirectories of the main paper directory. If step 4 fails in the main paper directory, then you will have to find where the figure is built. Because result names may be generated automatically, a simple grep may not be enough and you may need to read the SConstruct and python modules imported by it to figure out if the result is generated there.
  4. Re-build and display the figure by typing scons resultname.view in the appropriate directory.

SConstructs containing a Fetch instruction will attempt to download public-domain input data from a communal server when the "scons" command is run. A fast internet connection is necessary in this case.

How to reproduce entire papers using stored figures[edit]

  1. See the previous section for how to find the paper directory
  2. Pointer to how to download stored figures (Download#Reproducible figures)
  • scons pdf
  • scons read

Troubleshooting:

  • In case of failure with this kind of messages (details here), you miss TeX system dependencies. Install a TeX system. Tex Live should have it all. Note: It's a 1 Gb download. Too large for many individual users to bother with it and for most IT departments of companies to review for security. We should implement individual dependency checking, like we do in the installation.
  • In case of failure with LaTeX Error: File `geophysics.cls' not found you have LaTeX, but you are missing SEGTeX
  • If scons pdf in the paper directory requires pdf figures already in place in order to work, run sftour scons lock (?).

How to reproduce entire papers and all their figures[edit]

  1. See an earlier section for how to find the paper directory
  2. The relevant SCons command (scons lock, or sftour scons lock, as in Download#Reproducible figures to force reproducing the figures in the paper even when the reference figures repository is present

Tell the user to expect conditional reproducibility: If Matlab is not present, rsftex will not try to build the figures but will use the stored PDF files (same goes for Mathematica, xfig, etc.)

How to reproduce whole books[edit]

The 2006-12-22 rsf-user thread, the 2007-04-08 blog entry and more.

From Jim's 2009-03-7 rsf-devel message: Here is a way to generate all the targets in the subdirectories of $RSFSRC/book/geostats/spatial_stats:

cd $RSFSRC/book/geostats/spatial_stats
sftour scons

That works pretty nice. It generates all the targets in each of the 4 subdirectories of book/geostats/spatial_stats.

Now suppose I want to capture the output and errors in a log file (tcsh):

sftour scons >& scons.log

That's nice, except all the output goes into one log file in book/geostats/spatial_stats. Suppose I want 4 separate log files, one in each of the subdirectories. This will do the trick:

sftour 'scons >& scons.log'

The quotes make the entire string go to sftour as the command to run in each directory, instead of just 'scons'.

So far so good. Now suppose I want to go up one directory level and do the process recursively:

cd $RSFSRC/book/geostats
sftour sftour 'scons >& scons.log'

Well, that runs scons in each of the book/geostats/*/* directories, but it only makes 3 log files in the 3 subdirectories of book/geostats, not 14 log files in the 14 book/geostats/*/* directories. I can get 14 separate log files like this:

sftour "sftour 'scons >& scons.log'"

That does what I want. Now suppose I want to go up one more level to $RSFSRC/book and run the process recursively three levels deep:

sftour sftour "sftour 'scons >& scons.log'"

This works, but it doesn't put the log files in the bottom level, it puts them one level up. I can't fix it the same way I did before because I've run out of quotes :-) Here is one way to do it (tcsh):

foreach i (*)
    if ( -d $i) then
        echo ++++++ $i
        cd $i
        sftour "sftour '/usr/bin/time -p scons >& scons.log'"
        cd ..
    endif
end

From Sergey's 2009-03-10 rsf-devel message: A more elegant solution is

sftour sftour scons >& ../../%/%/scons.log

Writing a LaTeX paper in the Madagascar framework[edit]

Follows the natural progression of learning of somebody who may even not know LaTeX, let alone SCons.

  1. A paper with no figures.
  2. A paper with NR-only figures

Creating a reproducible paper[edit]

Sections 2 and 3 from Reproducible computational experiments using SCons. Also, mention the "SCons macros" in book/Recipes.

Publications included in Madagascar's book directories are tested periodically. Those publications that fail the tests and are not easily repaired are moved to book/Grave, with a note on how the tests failed and why the problems could not be fixed.

Data-conditional reproducibility[edit]

Due to seismic data licensing terms, an author may find it possible to make public everything that is needed to make a publication reproducible, except for the data. In such conditions, the paper is still acceptable for inclusion in the Madagascar collection. To indicate that certain datasets are private, the relevant SConstruct files should use Fetch(...,local=1) or tt>Fetch(...,server=private.server), where private.server is a password-protected private server. Affected vplot figures should be uploaded to the figure repository. Reproducibility testing will be skipped for the affected figures, but the html and pdf versions of the publications will still be created.

The usage of public seismic datasets, such as those at http://software.seg.org/, is strongly encouraged.

Creating a reproducible book[edit]

Developing in Madagascar[edit]

Writing your own programs[edit]

Introduction to the Madagascar API[edit]

  1. The existing data clipping API demo
  2. A more complex finite differences API demo – add Python, F77 and Matlab APIs to it

How to add your program[edit]

The relevant section in "Adding programs"

How to document your program[edit]

The relevant section in "Adding programs"

Style guide[edit]

The relevant section in "Adding programs"

How to test your program[edit]

The relevant section in "Adding programs"

How to parallelize your programs[edit]

The Parallel Computing page.

Tips and tricks[edit]

The relevant section in "Adding programs"

Madagascar library reference[edit]

Adding programs to the central repository[edit]

Framework development and maintenance[edit]

Description of m8r's inner works for those who want to help improve and maintain Madagascar. Maintenance guide and perhaps other stuff.

Graphics development with vplot[edit]

Graphics development with vplot

Packaging madagascar[edit]

Packaging madagascar

Datasets distributed with Madagascar[edit]

  • Description of datasets – pictures of the velocity model, of sample gathers, zero-offset sections, migrated image.
  • Comment on which are the main problems they illustrate (internal multiples? overturning waves? etc). Algorithm used for generating them, references to published literature describing the datasets
  • Command line options for correctly reading them from the storage format (SEG-Y, most probably) into RSF
  • In general, expand the datasets section of Reproducible Documents page to include other datasets

Other open-source data analysis packages[edit]

Geophysical software[edit]

Other open-source geophysical packages. Briefly discuss each of them. Mention "dictionaries" from them to m8r where available (should attempt to have dictionaries for all of them)

Other tools[edit]

There are many other useful open-source or public domain software tools in the domain of applied mathematics, that can be used complementary to Madagascar. A few common examples, in alphabetical order, are:

  • ALGLIB: a highly portable numerical analysis and data processing library;
  • ARPACK ("The ARnoldi PACKage"): a library written in FORTRAN 77 for solving large-scale eigenvalue problems;
  • ATLAS ("Automatically Tuned Linear Algebra Software"): a linear algebra library, implementing the BLAS APIs for C and Fortran77;
  • Blitz++: a C++ class library for scientific computing which provides performance on par with Fortran 77/90;
  • DUNE ("Distributed and Unified Numerics Environment"): a modular toolbox for solving partial differential equations with grid-based methods. It supports the easy implementation of methods like Finite Elements, Finite Volumes, and also Finite Differences.
  • FFTW ("The Fastest Fourier Transform in the West"): Hardware-adaptive FFT libraries;
  • GNU Scientific Library: a C library for numerical calculations in many branches of applied mathematics and science;
  • GNU Triangulated Surface Library: a library providing a set of useful functions to deal with 3D surfaces meshed with interconnected triangles;
  • LAPACK ("The Linear Algebra PACKage"): a collection of routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems;
  • MINPACK: a library of FORTRAN subroutines for the solving of systems of nonlinear equations, or the least squares minimization of the residual of a set of linear or nonlinear equations;
  • PETSc ("the Portable, Extensible Toolkit for Scientific computation"): A set of serial and parallel, linear and nonlinear, solvers for large-scale, sparse linear and nonlinear systems of equations;
  • SciPY ("Scientific Python"): a toolbox for scientific computing in Python;
  • uBLAS: a C++ template class library that provides BLAS level 1, 2, 3 functionality for dense, packed and sparse matrices.

Most Wikipedia pages of the above libraries are valuable resources, as is Wikipedia's list of numerical analysis software. Other libraries and standalone programs are available through the Netlib repository and indexed by the Guide to Available Mathematical Software. The U.S. DOE's ACTS Collection is another valuable repository.

Madagascar in conferences and publications[edit]

The content of the Conferences and Publications pages. A mention about the reproducible documents that are listed in the "Papers and books included in the Madagascar distribution" sections.

Madagascar project system administrator's guide[edit]

Mediawiki installation, customization and operation