FAQ

How do I do interactive picking?

June 12, 2014 FAQ No comments

While interactive picking is generally discouraged because of its non-reproducibility, occasionally it might be useful. Using interact= option with xtpen outputs mouse-click coordinates in a text file. However, they are Vplot coordinates, not easily related to physical coordinates of the image. Joe Dellinger has a more comprehensive plan for adding interactivity to Vplot graphics.

sfipick is a simple Tkinter script which allows for interactive picking. The interface is straightforward. Use left-button mouse clicks to add picks, right-button mouse clicks to remove wrong picks, middle-button to drag picks. The picks are written in a plain text file and can be processed later.

See also:

How far is Madagascar from being a production system?

August 4, 2013 FAQ 4 comments

Madagascar was set up by researchers for researchers, and this gives it some unique qualities. Sometimes user questions on mailing lists and in-person communications to people involved with the project reflect attempts to examine to what extent madagascar can be used for seismic data processing and imaging in a production environment. This post does not attempt to highlight the qualities that Madagascar already has — the wiki and blog are dedicated to that. It is not a feature request list from the main developer(s) either. It is an attempt to clarify the current usability limits of the package when it comes to production. It is also a maximal list — I am not aware of a single production system that has all the features below. So, what is the current delta between Madagascar and an optimal production system? In no particular order:
– Dataset exploration tools. Click on this trace, on that one, plot headers over data in an interactive workflow… be able to understand the data quickly. Be able to try many encodings quickly for headers and for data, like an automated lock picker. Ancient packages who have accrued decades of lessons of encountering messed-up SEG-Y have a strong advantage here. Also, may need to handle SEG-D and other arcane formats.
– Geometry. Original data is in a global x-y coordinate system (usually UTM, but there are others too). Most work is done on the inline-crossline grid. Migration occurs on a physical units grid parallel to the inline-crossline grid. Various steps may employ different such local coordinate systems. Each of these can be left-handed, with negative increments, etc. The project history needs to know at every moment what coordinate system the data is in (with a pointer to its unambiguous definition) and to transform between all of them easily (including the rotations).
– Autopicking, manual picking and interactive pick editing tools.
– Dealing with borehole logs, horizons, and other such “foreign objects”
– Centralized job submission and logging — the system remembers the script for each job (including what internal or external client it has been run for — for billing purposes). In madagascar the history file helps somewhat, but sometimes it is difficult to reconstitute the flow because a job may involve more than a file, and not all parameters are written to the history file. Having a centralized job database, and writing to header which job created the file, is much more effective.
– Separation of I/O and algorithms. The algorithms and the I/O are mixed in m8r programs. Instead there should be procedures (subroutines/functions) that are independent from the I/O form used, with drivers that can do I/O. This prevents reusing the same code between programs, and also prevents the reuse of subroutines in flow-based processing (allocating once, then calling procedures that act on the same ensemble of data). This applies even to the humble Mclip.c on the wiki. The “business logic” should be incapsulated into one or more libraries per user, and these should be installed in $RSFROOT/lib, so they can be called from other user directories without code duplication or symlinks.
– Parameter registration, so each program can be interrogated: “what types of files do you need for input, how many dimensions, etc”, “what are reasonable parameter ranges”, etc, and incompatibilities can be automatically spotted at the start-up check, instead of a week into the workflow.
– Large-scale distributed: (1) data re-sorting based on headers, and (2) multidimensional transpose. Sorting and transposing data to create the input for some types of migration may currently take much more than the migration itself
– Some common processing tools (f-x-y decon, etc)
– Widespread openmp parallelization of programs. Having procedure in place that ensure thread safety, and consistent conventions about the level at which shared memory parallelization happens.
– A generic parallelization framework with production features (ability to launch in small batches, re-send failed jobs to other nodes, smart distributed collect with checkpointing, adding and subtracting nodes during the job, preparing HTML or PNG reports as to the state of the job upon a query, etc). A full list of features for this item can be quite long.
– Package-wide conventions, i.e. let’s use only full offsets in all CLI parameters for all programs, no half offsets.
– Hooks for interfacing with a queuing system (OpenPBS/etc). Users should not have the permissions to ssh/rsh directly into the nodes of a properly set up cluster.
– A GUI for setting up flows so CLI-challenged people can still be productive, and newbies can discover system features easily. Of course, the system should still be fully scriptable. Like any software, this GUI needs to be used by a number of people large enough so that bugs are ironed out, otherwise it will just be bypassed because of buginess.
– Visualization with zoom and interactive annotations in 2-D, interactive visualization of 3-D data volumes (regularly and irregularly sampled). Or, even better, the ability to output in a format that can use open source third party visualization, using geophysical or medical imaging visualization packages.
I am sure there are many other features that can be added to this list. This is just my brain dump at this moment.
Again — the above enumeration was neither a criticism of madagascar, nor a request for work in these directions (as the needed quantity of work for some of them may be enormous). These are not flaws — m8r is an excellent research, learning and technology platform.
The reasons why established companies have not contributed such features to madagascar (or other open-source projects with a living community) is of course that they already had somewhat satisfactory software solutions for them, otherwise they couldn’t have been in business. Then, the only hope was from startups. Startups however, even when they do not explicitly sell their code, they think of being able to sell the company to someone who might want to do that, and might view non-GPL software as a more valuable asset. So no wonder there have been so few contributions in the above directions. The correct solution to this quandary is companies recognizing that, like the operating system, none of this infrastructure provides a competitive advantage (algorithm kernels do), and thus sharing costs with other companies is the logical thing. The disadvantage of reducing the entry barrier to newcomers is balanced by the improved quality of the platform through collaboration and a larger user base, having new hires already proficient with it (both as users and as developers), and saving money that can be directed to core business.

How can I read and write RSF files in MATLAB?

April 13, 2013 FAQ No comments

  • The most straightforward way is to install the MATLAB interface to Madagascar. When installing Madagascar, run
    ./configure API=matlab

    The configure script will try to find and test matlab and mex executibles on your system. If they are not in your PATH, you can specify them with

    ./configure API=matlab MATLAB=/path/to/matlab MEX=/path/to/mex

    Install Madagascar as usual, set MATLAB path to $RSFROOT/lib, and you will able to read and write RSF files from MATLAB using rsf_read, rsf_write, and other functions from the Madagascar interface.

  • Alternatively, you can try reading binary data using MATLAB functions, as in the following example

    % get in=, n1=, and n2= parameters from file.rsf
    [stat,in] = unix(‘sfget in parform=n < file.rsf’)
    in = strtrim(in)
    [stat,n1] = unix(‘sfget n1 parform=n < file.rsf’)
    n1 = str2num(n1)
    [stat,n2] = unix(‘sfget n2 parform=n < file.rsf’)
    n2 = str2num(n2)
    % read binary data
    fid = fopen(in,‘rb’)
    data = fread(fid,n1*n2,‘float32’);
    % reshape to 2-D matrix
    data = reshape(data,n1,n2);
  • An even better alternative is to abandone MATLAB in favor of free software, such as GNU Octave, Python with NumPy, Sage, etc. A Python interface to Madagascar is installed by default.

If I write a research paper using Madagascar, do I need to add a reference?

January 19, 2013 FAQ No comments

Please take into account the following considerations:

  1. According to the GPL license, there is no legal requirement to reference Madagascar when you use it.
  2. Please consider contributing your paper to Madagascar. Multiple studies have shown that reproducible papers receive more attention from the readers and make a higher impact. Releasing your paper can also help you maintain your own research results. In words of Jon Claerbout,

    “It takes some effort to organize your research to be reproducible. We found that although the effort seems to be directed to helping other people stand up on your shoulders, the principal beneficiary is generally the author herself. This is because time turns each one of us into another person, and by making effort to communicate with strangers, we help ourselves to communicate with our future selves.”

  3. If codes from Madagascar played an important role in the research that led to your paper, referencing it is a scientifically ethical thing to do. As stated by the Ethical Guidelines of the American Mathematical Society,
    “The correct attribution of mathematical results is essential, both because it encourages creativity, by benefiting the creator whose career may depend on the recognition of the work and because it informs the community of when, where, and sometimes how original ideas entered into the chain of mathematical thought.”

Of course, such statements can be applied not only to mathematical ideas but also to computational results.

How to cite Madagascar? Mike Jackson from the Software Sustainability Institute makes a set of clear recommendations for citing software. A citation may look like Madagascar software. Version 1.4. October 2012. http://www.ahay.org/ or more specific with reference to a particular program used and its author. In BibTeX, you can use something like

@Manual{Madagascar,  
author = {{Madagascar Development Team}},  
title = {Madagascar Software, Version~1.4},  
year = {2012},  
address = {http://www.ahay.org/}  
}

How do I convert a reproducible paper from LaTeX to HTML?

October 21, 2012 FAQ No comments

Follow the following steps:

  1. Install SEGTeX.
  2. Install LaTeX2HTML.
  3. Set LATEX2HTML environmental variable to $TEXMF/latex2html where $TEXMF is the place of your SEGTeX installation.
  4. In the reproducible paper directory, run
    sftour scons lock

    to install reproducible figures. Then run

    scons html

    or

    scons papername.html

    (if the file name is papername.tex rather than paper.tex) to convert the paper to HTML.

  5. In the reproducible paper directory, run
    scons install

    or

    scons papername.install

    (if the file name is papername.tex rather than paper.tex) to install the HTML paper under $RSFROOT/share/madagascar/book

  6. On the book level, you can convert a full report or book to HTML by running
    scons www

See more instructions on assembling reports from papers.

How is regression testing done in Madagascar?

September 22, 2012 FAQ No comments

Testing for reproducibility is an important principle behind Madagascar’s design. It works on several levels.

  1. Inside a project directory (with SConstruct file that contains from rsf.proj import *), run
    scons lock

    to create Result files (figures) and to copy them to a different location (specified by RSFFIGS or $RSFROOT/share/madagascar/figs by default). Papers included with Madagascar (under $RSFSRC/book) have their result figures saved in a repository.
    To come back and test if the results are still reproducible, run

    scons test

    or

    scons figurename.test

    This command performs an intelligent comparison of figures using Joe Dellinger’s sfvplotdiff and reports an error if the figures are different. In the case of an error, you can run

    scons figurename.flip

    to flip between the new version of the figure and the old version and on the screen and to compare them visually. Based on that comparison, you can either “lock” the new version with

    scons figurename.lock

    or debug the error that caused the difference and try to fix it.

  2. To test all projects where a particular program, say sfspike, is used, run
    cd $RSFSRC/book; scons sfspike.test

    This is useful for regression testing for changes in programs that may cause reproducibility failures. You can also run

    cd $RSFSRC/book; scons test

    to test all projects and all Madagascar programs. By default, testing is limited to projects that use only publicly available data and less that 1 Mb of disk space. This behavior can be changes by giving all=y or size= parameters to scons test.

  3. A collection of scripts developed by Jim Jennings and explained on the Automatic Testing page performs fine-grain testing with extended diagnostics.

Why does Madagascar have such poor documentation?

June 17, 2012 FAQ No comments

Most likely, the person asking this question does not know where to look for documentation. With 6 schools, 300 blog posts, 80 wiki pages, 500-page manual and 50-page tutorial, there is plenty of information for somebody willing to learn about Madagascar or to understand how it works.

Reproducible papers and books link scientific descriptions of different algorithms with examples of their usage and thus represent the best kind of documentation scientific software may have. On the last count, there were more than 500 computational recipes (SConstruct files) and more than 5,000 reproducible figures. Reproducibility means: we provide reproducible examples of using a particular program but we do not guarantee that the program will work properly with a different choice of parameters or with different input data. With time, the ecosystem evolves. Some programs are getting used more often and, as a result, are getting better debugged and more thoroughly documented.

Madagascar has a low barrier for authors to start contributing their work. When some of the newly contributed programs and examples are not sufficiently well documented, documentation becomes a process of communication between authors and users. Engaging users as active participants in this process will help us make it more efficient.

Which country has the most Madagascar users?

September 11, 2011 FAQ No comments

During the last four years, there have been nearly 170,000 visits to the Madagascar website (including 126 visits from the island of Madagascar). The top ten countries, as counted by Google Analytics, are: USA, China, Canada, UK, Brazil, Germany, Italy, Saudi Arabia, France, and India.

If the top ten are normalized by population (to compute visits per capita), they become: Canada, Saudi Arabia, USA, UK, Italy, France, Germany, Brazil, China, and India.

What are the design principles of Madagascar?

June 19, 2011 FAQ No comments

The Madagascar code is designed around several fundamental principles.

  1. Modularity. This principle comes from Unix. Doug McIlroy, the inventor of Unix pipes, formulates it as

    This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

    Madagascar makes one exception: RSF files are text streams but they point to binary data. This is the simplest way to handle large datasets while preserving the Unix approach.

  2. KISS (Keep it simple, Stupid!). This principle is closely related to modularity. We try to make our tools and formats as simple as possible to achieve the given functionality.
  3. Test-driven development This principle does not apply literally to scientific programming, because scientific computing is often exploratory: the result of the computational experiment is not always known beforehand. However, once the experiment is completed, it immediately becomes a test for future development, because we expect the results of the experiment to be reproducible. A Madagascar module is not included in the official distribution until there is an example of its usage in reproducible documents.
  4. YAGNI (You ain’t gonna need it!). This principle comes from XP (Extreme Programming). Ron Jeffries, one of the founders of XP, states it as

    Always implement things when you actually need them, never when you just foresee that you need them.

    Madagascar is not developed for imaginary users. It is developed by people who use it and who add functionality as they need it. This is also known as “scratching a developer’s personal itch”, a feeling familiar to the creators of Unix. As Dennis Ritchie admits in a recent interview,

    Apart from doing new and cool stuff, what guided us was really kind of selfish—to write tools we could use ourselves to make our lives easier: “I’d like such-and-such to do such-and-such, and that’s hard to do now. What kind of tool can I write to make that easier?”

(images from Wikipedia)

How do I prepare a Geophysics article for submission?

March 20, 2011 FAQ No comments

The Geophysics instructions to authors state

Preferred formats for production are Microsoft Word and LaTeX, in that order.

Never mind the order. If you use LaTeX, you are not alone. According to SEG staff, half of submitted papers use LaTeX, including papers from many of the SEG editors. The SEGTeX package has been downloaded from SourceForge more than 5,000 times. Here are some useful trips for producing a Geophysics paper with SCons and rsf.tex:

1. If you don’t use Madagascar for your computations but would like to use the SCons setup for papers, you can download and install the madagascar-framework Python package.

2. To prepare a paper called article.tex for submission, put

use rsf.tex
Paper('article',options='manuscript')

in you SConstruct, then run scons article.pdf to produce the manuscript or scons article.read to display it on the screen. If your paper is named paper.tex, you can also put the options in

End(options='manuscript')

and use simply scons pdf and scons read. See the wiki documentation for more options and explanations.

3. Submit your paper by login into ManuscriptCentral.

4. After your paper goes through a round of revisions, you will be asked to prepare and submit both the new version and the revised version with the revision clearly marked. Use \old{} and \new{} macros to mark your changes, as shown in the example. You can produce both PDF files from once source using something like

use rsf.tex
Paper('article',options='manuscript')

Command('article-revised.tex','article.tex','cp $SOURCE $TARGET') 
Paper('article-revised', options='manuscript,revised')

5. When submitting the final version, you will be asked to submit the LaTeX file that includes bibliography. If you use BibTeX, do the following:

  1. Run scons article.pdf
  2. Open article.ltx in an editor and replace the line \bibliography{} with the contents of article.bbl.
  3. Submit article.ltx.

6. When submitting the final version, you will be asked to submit high-resolution figures in EPS format. Run scons article.figs to generate figures suitable for submission.

7. Geophysics may insist that the labels on the vertical axis in your Madagascar plots should run horizontally, rather than vertically. To comply with this bizarre requirement, you may need to regenerate your plots using parallel2=n option.

See also: