Spectral recomposition

August 9, 2013 Documentation No comments

A new paper is added to the collection of reproducible documents:
Automated spectral recomposition with application in stratigraphic interpretation

Analyzing seismic attributes in the frequency domain is helpful for reservoir characterization. To analyze the reservoir interval of interest in detail, it is important to capture the seismic response at each frequency subset. Spectral recomposition can be used to extract significant components from the seismic spectrum. We propose a separable nonlinear least-squares algorithm for spectral recomposition, which estimates both linear and nonlinear parts automatically in separate steps. Our approach is applied to estimate fundamental signal parameters, peak frequencies and amplitudes, with which the seismic spectrum can be reconstructed. Automated spectral recomposition helps us visualize frequency-dependent geological features on both cross sections and time slices by extracting significant frequency components. Spectral recomposition can also indicate how frequency contents attenuate with time.

Omnidirectional PWD

August 9, 2013 Documentation No comments

A new paper is added to the collection of reproducible documents:
Omnidirectional plane-wave destruction

Steep structures in seismic data may bring directional aliasing, thus plane-wave destruction (PWD) filter can not obtain an accurate dip estimation. We propose to interpret plane-wave construction (PWC) filter as a line-interpolating operator and introduce a novel circle-interpolating model. The circle-interpolating PWC can avoid phase-wrapping problems, and the related circle-interpolating plane-wave destruction (PWD) can avoid aliasing problems. We design a 2D maxflat fractional delay filter to implement the circle interpolation, and prove that the 2D maxflat filter is separable in each direction. Using the maxflat fractional delay filter in the circle interpolation, we propose the omnidirectional plane-wave destruction (OPWD). The omnidirectional PWD can handle both vertical and horizontal structures. With a synthetic example, we show how to obtain an omnidirectional dip estimation using the proposed omnidirectional PWD. An application of the omnidirectional PWD to a field dataset improves the results of predictive painting and event picking, as compared to conventional PWD.

Reproducible research and IPython notebooks

August 6, 2013 Systems No comments

Reproducible science was one of the main general themes at the recent SciPy conference in Austin. While different tools for accomplishing reproducible research are being proposed, IPython notebooks are often mentioned as one of the main tools. In his review of the meeting, Eric Jones of Enthought writes:

We can safely say that 2013 is the year of the IPython notebook. It was everywhere. I’’d guess 80+% of the talks and tutorials for the conference used it in their presentations.

As illustrated by the screenshots below, it is possible to combine SCons-based processing workflows with IPython notebooks.

The notebook used in this example is in rsf/rsf/test/test.ipynb

How far is Madagascar from being a production system?

August 4, 2013 FAQ 6 comments

Madagascar was set up by researchers for researchers, and this gives it some unique qualities. Sometimes user questions on mailing lists and in-person communications to people involved with the project reflect attempts to examine to what extent madagascar can be used for seismic data processing and imaging in a production environment. This post does not attempt to highlight the qualities that Madagascar already has — the wiki and blog are dedicated to that. It is not a feature request list from the main developer(s) either. It is an attempt to clarify the current usability limits of the package when it comes to production. It is also a maximal list — I am not aware of a single production system that has all the features below. So, what is the current delta between Madagascar and an optimal production system? In no particular order:
– Dataset exploration tools. Click on this trace, on that one, plot headers over data in an interactive workflow… be able to understand the data quickly. Be able to try many encodings quickly for headers and for data, like an automated lock picker. Ancient packages who have accrued decades of lessons of encountering messed-up SEG-Y have a strong advantage here. Also, may need to handle SEG-D and other arcane formats.
– Geometry. Original data is in a global x-y coordinate system (usually UTM, but there are others too). Most work is done on the inline-crossline grid. Migration occurs on a physical units grid parallel to the inline-crossline grid. Various steps may employ different such local coordinate systems. Each of these can be left-handed, with negative increments, etc. The project history needs to know at every moment what coordinate system the data is in (with a pointer to its unambiguous definition) and to transform between all of them easily (including the rotations).
– Autopicking, manual picking and interactive pick editing tools.
– Dealing with borehole logs, horizons, and other such “foreign objects”
– Centralized job submission and logging — the system remembers the script for each job (including what internal or external client it has been run for — for billing purposes). In madagascar the history file helps somewhat, but sometimes it is difficult to reconstitute the flow because a job may involve more than a file, and not all parameters are written to the history file. Having a centralized job database, and writing to header which job created the file, is much more effective.
– Separation of I/O and algorithms. The algorithms and the I/O are mixed in m8r programs. Instead there should be procedures (subroutines/functions) that are independent from the I/O form used, with drivers that can do I/O. This prevents reusing the same code between programs, and also prevents the reuse of subroutines in flow-based processing (allocating once, then calling procedures that act on the same ensemble of data). This applies even to the humble Mclip.c on the wiki. The “business logic” should be incapsulated into one or more libraries per user, and these should be installed in $RSFROOT/lib, so they can be called from other user directories without code duplication or symlinks.
– Parameter registration, so each program can be interrogated: “what types of files do you need for input, how many dimensions, etc”, “what are reasonable parameter ranges”, etc, and incompatibilities can be automatically spotted at the start-up check, instead of a week into the workflow.
– Large-scale distributed: (1) data re-sorting based on headers, and (2) multidimensional transpose. Sorting and transposing data to create the input for some types of migration may currently take much more than the migration itself
– Some common processing tools (f-x-y decon, etc)
– Widespread openmp parallelization of programs. Having procedure in place that ensure thread safety, and consistent conventions about the level at which shared memory parallelization happens.
– A generic parallelization framework with production features (ability to launch in small batches, re-send failed jobs to other nodes, smart distributed collect with checkpointing, adding and subtracting nodes during the job, preparing HTML or PNG reports as to the state of the job upon a query, etc). A full list of features for this item can be quite long.
– Package-wide conventions, i.e. let’s use only full offsets in all CLI parameters for all programs, no half offsets.
– Hooks for interfacing with a queuing system (OpenPBS/etc). Users should not have the permissions to ssh/rsh directly into the nodes of a properly set up cluster.
– A GUI for setting up flows so CLI-challenged people can still be productive, and newbies can discover system features easily. Of course, the system should still be fully scriptable. Like any software, this GUI needs to be used by a number of people large enough so that bugs are ironed out, otherwise it will just be bypassed because of buginess.
– Visualization with zoom and interactive annotations in 2-D, interactive visualization of 3-D data volumes (regularly and irregularly sampled). Or, even better, the ability to output in a format that can use open source third party visualization, using geophysical or medical imaging visualization packages.
I am sure there are many other features that can be added to this list. This is just my brain dump at this moment.
Again — the above enumeration was neither a criticism of madagascar, nor a request for work in these directions (as the needed quantity of work for some of them may be enormous). These are not flaws — m8r is an excellent research, learning and technology platform.
The reasons why established companies have not contributed such features to madagascar (or other open-source projects with a living community) is of course that they already had somewhat satisfactory software solutions for them, otherwise they couldn’t have been in business. Then, the only hope was from startups. Startups however, even when they do not explicitly sell their code, they think of being able to sell the company to someone who might want to do that, and might view non-GPL software as a more valuable asset. So no wonder there have been so few contributions in the above directions. The correct solution to this quandary is companies recognizing that, like the operating system, none of this infrastructure provides a competitive advantage (algorithm kernels do), and thus sharing costs with other companies is the logical thing. The disadvantage of reducing the entry barrier to newcomers is balanced by the improved quality of the platform through collaboration and a larger user base, having new hires already proficient with it (both as users and as developers), and saving money that can be directed to core business.

Results may vary

August 3, 2013 Links No comments

Slide about the emerging reproducible research ecosystem from Carole Goble’s keynote presentation at a computational biology conference last month. The presentation summarizes the issues involved in reproducible research and the recent progress.

ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit? from Carole Goble

See also a presentation sketch by Jenny Chen.

Program of the month: sfai2refl

August 2, 2013 Programs No comments

sfai2refl converts acoustic impedance to normal-incidence PP reflectivity using the simple equation
$$R(t) = \frac{I(t+\Delta t) – I(t)}{I(t+\Delta t) + I(t)}$$

The program is useful for convolution modeling. The following example from rsf/rsf/wedge shows a classic example of wedge modeling, which is used for analyzing the tuning phenomenon (Kallweit, R. and L. Wood, 1982, The limits of resolution of zero-phase wavelets: Geophysics, v. 47, 1035–-1046.)

A more sophisticated example from geo384w/hw5/sigsbee2 computes an effective reflectivity of the Sigsbee model:

The program has no parameters.

10 previous programs of the month

First Working Workshop

July 28, 2013 Celebration 2 comments

An unusual experiment in collaborative reproducible research took place in Austin, Texas, on July 25-27: 25 participants from 9 different organizations gathered at the Bureau of Economic Geology, The University of Texas at Austin for the First Madagascar Working Workshop. The participants divided into 10 teams of 2-3 people by pairing experienced Madagascar developers with novice users. Each team worked on a small project, updating either reproducible papers or entries in the migration gallery. At the conclusion of the meeting, the participants discussed their experience and plans for future workshops.

Butterfly algorithm for generalized Radon

July 26, 2013 Documentation No comments

A new paper is added to the collection of reproducible documents:
A fast butterfly algorithm for generalized Radon transforms

madagascar-1.5 released

July 24, 2013 Celebration No comments

The 1.5 stable release features nine new reproducible papers and multiple other enhancements including the addition of the RVL (Rice Vector Library) and IWAVE++ full waveform inversion packages from Bill Symes and The Rice Inversion Project.

According to the SourceForge statistics, the previous 1.4 stable distribution has been downloaded more than 3,000 times. The total number of Madagascar downloads is approaching 25,000. According to Ohloh.net,, the year before 1.5 release was the period of a record development activity, with 32 contributors (up 33% compared to the previous year) making 1,570 commits to the repository (up 35%). Ohloh.net says that Madagascar “has a well established, mature codebase maintained by a very large development team with increasing year-over-year commits” and estimated 167 man-years of effort.

Program of the month: sftime2depth

July 1, 2013 Programs 2 comments

sftime2depth converts the input from vertical time to depth coordinates.

The following example from rsf/su/rsflab9 shows a seismic image converted from time to depth by this transformation:

The example is borrowed from John Stockwell’s lecture notes on Geophysical Image Processing and translated from Seismic Unix to Madagascar. The transformation follows the simple equation
$$t = 2\,\int\limits_{0}^{z} \frac{d\zeta}{v(\zeta)}$$,
where $t$ is two-way vertical time, $z$ is depth, and $v(z)$ is vertical velocity.

The sampling of the output depth axis is controlled by nz=, dz=, and z0=. If the velocity (provided in the auxiliary velocity= file) is sampled in time rather than depth, use intime=y. If it is slowness rather than velocity, use slow=y. If the input is in one-way time rather than two-way time, use twoway=n. The interpolation is carried out using B-splines. The spline order is controlled by extend=. By default, cubic splines (extend=4) are used.

The inverse transformation is given by sfdepth2time.

In the presence of lateral velocity variations, the correct transformation from time to depth is not as simple and needs additional corrections. See Time-to-depth conversion and seismic velocity estimation using time-migration velocity.

10 previous programs of the month