Guide to madagascar API

From Madagascar
Revision as of 19:47, 11 October 2008 by Nick (talk | contribs) (→‎Introduction)
Jump to navigation Jump to search
This page was created from the LaTeX source in book/rsf/rsf/api.tex using latex2wiki

This guide explains the RSF programming interface.

Introduction

To work with RSF files in your own programs, you may need to use an appropriate programming interface. We will demonstrate the interface in different languages using a simple example. The example is a clipping program. It reads and writes RSF files and accesses parameters both from the input file and the command line. The input is processed trace by trace. This is not necessarily the most efficient approach[1] but it suffices for a simple demonstration.

Installation

Only the C interface is installed by default. To install other APIs, use API= parameter in the RSF configuration. For example, to install C++ and Fortran-90 API bindings in addition to the basic package, run

./configure API=c++,fortran-90
scons install

The configuration parameters are stored in $RSFROOT/lib/rsfconfig.py.

C interface

The C clip function is listed below. <c> /* Clip the data. */

  1. include <rsf.h>

int main(int argc, char* argv[]) {

   int n1, n2, i1, i2;
   float clip, *trace;
   sf_file in, out; /* Input and output files */
   /* Initialize RSF */
   sf_init(argc,argv);
   /* standard input */
   in = sf_input("in");
   /* standard output */
   out = sf_output("out");
   /* check that the input is float */
   if (SF_FLOAT != sf_gettype(in)) 

sf_error("Need float input");

   /* n1 is the fastest dimension (trace length) */
   if (!sf_histint(in,"n1",&n1)) 

sf_error("No n1= in input");

   /* leftsize gets n2*n3*n4*... (the number of traces) */
   n2 = sf_leftsize(in,1);
   /* parameter from the command line (i.e. clip=1.5 ) */
   if (!sf_getfloat("clip",&clip)) sf_error("Need clip=");
   /* allocate floating point array */
   trace = sf_floatalloc (n1);
   /* loop over traces */
   for (i2=0; i2 < n2; i2++) {

/*read a trace */ sf_floatread(trace,n1,in);

/* loop over samples */ for (i1=0; i1 < n1; i1++) { if (trace[i1] > clip) trace[i1]= clip; else if (trace[i1] < -clip) trace[i1]=-clip; }

/* write a trace */ sf_floatwrite(trace,n1,out);

   }
   exit(0);

} </c>

Let us examine it in detail. <c>

  1. include <rsf.h>

</c>

The include preprocessing directive is required to access the RSF interface. <c>

   sf_file in, out; /* Input and output files */

</c>

RSF data files are defined with an abstract sf_file data type. An abstract data type means that the contents of it are not publicly declared, and all operations on sf_file objects should be performed with library functions. This is analogous to FILE * data type used in stdio.h and as close as C gets to an object-oriented style of programming (Roberts, 1998[2]). <c>

   /* Initialize RSF */
   sf_init(argc,argv);

</c>

Before using any of the other functions, you must call sf_init. This function parses the command line and initializes an internally stored table of command-line parameters. <c>

   /* standard input */
   in = sf_input("in");
   /* standard output */
   out = sf_output("out");

</c>

The input and output RSF file objects are created with sf_input and sf_output constructor functions. Both these functions take a string argument. The string may refer to a file name or a file tag. For example, if the command line contains vel=velocity.rsf, then both sf_input("velocity.rsf") and sf_input("vel") are acceptable. Two tags are special: "in" refers to the file in the standard input and "out" refers to the file in the standard output. <c>

   /* check that the input is float */
   if (SF_FLOAT != sf_gettype(in)) 

sf_error("Need float input"); </c>

RSF files can store data of different types (character, integer, floating point, complex). We extract the data type of the input file with the library sf_gettype function and check if it represents floating point numbers. If not, the program is aborted with an error message, using the sf_error function. It is generally a good idea to check the input for user errors and, if they cannot be corrected, to take a safe exit. <c>

   /* n1 is the fastest dimension (trace length) */
   if (!sf_histint(in,"n1",&n1)) 

sf_error("No n1= in input");

   /* leftsize gets n2*n3*n4*... (the number of traces) */
   n2 = sf_leftsize(in,1);

</c>

Conceptually, the RSF data model is a multidimensional hypercube. By convention, the dimensions of the cube are stored in n1=, n2=, etc. parameters. The n1 parameter refers to the fastest axis. If the input dataset is a collection of traces, n1 refers to the trace length. We extract it using the sf_histint function (integer parameter from history) and abort if no value for n1 is found. We could proceed in a similar fashion, extracting n2, n3, etc. If we are interested in the total number of traces, like in the clip example, a shortcut is to use the sf_leftsize function. Calling sf_leftsize(in,0) returns the total number of elements in the hypercube (the product of n1, n2, etc.), calling sf_leftsize(in,1) returns the number of traces (the product of n2, n3, etc.), calling sf_leftsize(in,2) returns the product of n3, n4, etc. By calling sf_leftsize, we avoid the need to extract additional parameters for the hypercube dimensions that we are not interested in. <c>

   /* parameter from the command line (i.e. clip=1.5 ) */
   if (!sf_getfloat("clip",&clip)) sf_error("Need clip=");

</c>

The clip parameter is read from the command line, where it can be specified, for example, as clip=10. The parameter has the float type, therefore we read it with the sf_getfloat function. If no clip= parameter is found among the command line arguments, the program is aborted with an error message using the sf_error function. <c>

   /* allocate floating point array */
   trace = sf_floatalloc (n1);

</c>

Next, we allocate an array of floating-point numbers to store a trace with the library sf_floatalloc function. Unlike the standard malloc the RSF allocation function checks for errors and either terminates the program or returns a valid pointer. <c>

   /* loop over traces */
   for (i2=0; i2 < n2; i2++) {

/*read a trace */ sf_floatread(trace,n1,in);

/* loop over samples */ for (i1=0; i1 < n1; i1++) { if (trace[i1] > clip) trace[i1]= clip; else if (trace[i1] < -clip) trace[i1]=-clip; }

/* write a trace */ sf_floatwrite(trace,n1,out);

   }

</c>

The rest of the program is straightforward. We loop over all available traces, read each trace, clip it and right the output out. The syntax of sf_floatread and sf_floatwrite functions is similar to the syntax of the C standard fread and fwrite function except that the type of the element is specified explicitly in the function name and that the input and output files have the RSF type sf_file.

Compiling

To compile the clip program, run

cc clip.c -I$RSFROOT/include -L$RSFROOT/lib -lrsf -lm

Change cc to the C compiler appropriate for your system and include additional compiler flags if necessary. The flags that RSF typically uses are in $RSFROOT/lib/rsfconfig.py.

C++ interface

The C++ clip function is listed below. <cpp> /* Clip the data. */

  1. include <valarray>
  2. include <rsf.hh>

int main(int argc, char* argv[]) {

   sf_init(argc,argv); // Initialize RSF
   
   iRSF par(0), in; // input parameter, file
   oRSF out;        // output file
   int n1, n2;      // trace length, number of traces
   float clip;
   
   in.get("n1",n1);
   n2=in.size(1);
   par.get("clip",clip); // parameter from the command line
   std::valarray<float> trace(n1);
   for (int i2=0; i2 < n2; i2++) { // loop over traces

in >> trace; // read a trace

for (int i1=0; i1 < n1; i1++) { // loop over samples if (trace[i1] > clip) trace[i1]=clip; else if (trace[i1] < -clip) trace[i1]=-clip; }

out << trace; // write a trace

   }
   exit(0);

} </cpp>

Let us examine it line by line. <cpp>

  1. include <rsf.hh>

</cpp>

Including "rsf.hh" is required for accessing the RSF C++ interface. <cpp>

   sf_init(argc,argv); // Initialize RSF

</cpp>

A call to sf_init is required to initialize the internally stored table of command-line arguments. <cpp>

   iRSF par(0), in; // input parameter, file
   oRSF out;        // output file

</cpp>

Two classes: iRSF and oRSF are used to define input and output files. For simplicity, the command-line parameters are also handled as an iRSF object, initialized with zero. <cpp>

   in.get("n1",n1);
   n2=in.size(1);

</cpp>

Next, we read the data dimensions from the input RSF file object called in: the trace length is a parameter called "n1" and the number of traces is the size of in remaining after excluding the first dimension. It is extracted with the size method. <cpp>

   par.get("clip",clip); // parameter from the command line

</cpp>

The clip parameter should be specified on the command line, for example, as clip=10. It is extracted with the get method of iRSF class from the par object. <cpp>

   std::valarray<float> trace(n1);

</cpp>

The trace object has the single-precision floating-point type and is a 1-D array of length n1. It is declared and allocated using the valarray template class from the standard C++ library. <cpp>

   for (int i2=0; i2 < n2; i2++) { // loop over traces

in >> trace; // read a trace

for (int i1=0; i1 < n1; i1++) { // loop over samples if (trace[i1] > clip) trace[i1]=clip; else if (trace[i1] < -clip) trace[i1]=-clip; }

out << trace; // write a trace

   }

</cpp>

Next, we loop through the traces, read each trace from in, clip it and write the output to out.

Compiling

To compile the C++ program, run

c++ clip.cc -I$RSFROOT/include -L$RSFROOT/lib -lrsf++ -lrsf -lm

Change c++ to the C++ compiler appropriate for your system and include additional compiler flags if necessary. The flags that RSF typically uses are in $RSFROOT/lib/rsfconfig.py.

Fortran-77 interface

The Fortran-77 clip function is listed below. <fortran> program Clipit implicit none integer n1, n2, i1, i2, in, out integer sf_input, sf_output, sf_leftsize, sf_gettype logical sf_getfloat, sf_histint real clip, trace(1000)

call sf_init() in = sf_input("in") out = sf_output("out")

if (3 .ne. sf_gettype(in))

    &  call sf_error("Need float input")

if (.not. sf_histint(in,"n1",n1)) then call sf_error("No n1= in input") else if (n1 > 1000) then call sf_error("n1 is too long") end if n2 = sf_leftsize(in,1)

if (.not. sf_getfloat("clip",clip))

    &  call sf_error("Need clip=")

do 10 i2=1, n2 call sf_floatread(trace,n1,in)

do 20 i1=1, n1 if (trace(i1) > clip) then trace(i1)=clip else if (trace(i1) < -clip) then trace(i1)=-clip end if

20	   continue

call sf_floatwrite(trace,n1,out)

10	continue

stop end </fortran>

Let us examine it in detail. <fortran> call sf_init() </fortran>

The program starts with a call to sf_init, which initializes the command-line interface. <fortran> in = sf_input("in") out = sf_output("out") </fortran>

The input and output files are created with calls to sf_input and sf_output. Because of the absence of derived types in Fortran-77, we use simple integer pointers to represent RSF files. Both sf_input and sf_output accept a character string, which may refer to a file name or a file tag. For example, if the command line contains vel=velocity.rsf, then both sf_input("velocity.rsf") and sf_input("vel") are acceptable. Two tags are special: "in" refers to the file in the standard input and "out" refers to the file in the standard output. <fortran> if (3 .ne. sf_gettype(in))

    &  call sf_error("Need float input")

</fortran>

RSF files can store data of different types (character, integer, floating point, complex). The function sf_gettype checks the type of data stored in the RSF file. We make sure that the type corresponds to floating-point numbers. If not, the program is aborted with an error message, using the sf_error function. It is generally a good idea to check the input for user errors and, if they cannot be corrected, to take a safe exit. <fortran> if (.not. sf_histint(in,"n1",n1)) then call sf_error("No n1= in input") else if (n1 > 1000) then call sf_error("n1 is too long") end if n2 = sf_leftsize(in,1) </fortran>

Conceptually, the RSF data model is a multidimensional hypercube. By convention, the dimensions of the cube are stored in n1=, n2=, etc. parameters. The n1 parameter refers to the fastest axis. If the input dataset is a collection of traces, n1 refers to the trace length. We extract it using the sf_histint function (integer parameter from history) and abort if no value for n1 is found. Since Fortran-77 cannot easily handle dynamic allocation, we also need to check that n1 is not larger than the size of the statically allocated array. We could proceed in a similar fashion, extracting n2, n3, etc. If we are interested in the total number of traces, like in the clip example, a shortcut is to use the sf_leftsize function. Calling sf_leftsize(in,0) returns the total number of elements in the hypercube (the product of n1, n2, etc.), calling sf_leftsize(in,1) returns the number of traces (the product of n2, n3, etc.), calling sf_leftsize(in,2) returns the product of n3, n4, etc. By calling sf_leftsize, we avoid the need to extract additional parameters for the hypercube dimensions that we are not interested in. <fortran> if (.not. sf_getfloat("clip",clip))

    &  call sf_error("Need clip=")

</fortran>

The clip parameter is read from the command line, where it can be specified, for example, as clip=10. The parameter has the float type, therefore we read it with the sf_getfloat function. If no clip= parameter is found among the command line arguments, the program is aborted with an error message using the sf_error function. <fortran> do 10 i2=1, n2 call sf_floatread(trace,n1,in)

do 20 i1=1, n1 if (trace(i1) > clip) then trace(i1)=clip else if (trace(i1) < -clip) then trace(i1)=-clip end if

20	   continue

call sf_floatwrite(trace,n1,out)

10	continue

</fortran>

Finally, we do the actual work: loop over input traces, reading, clipping, and writing out each trace.

Compiling

To compile the Fortran-77 program, run

f77 clip.f -L$RSFROOT/lib -lrsff -lrsf -lm

Change f77 to the Fortran compiler appropriate for your system and include additional compiler flags if necessary. The flags that RSF typically uses are in $RSFROOT/lib/rsfconfig.py.

Fortran-90 interface

The Fortran-90 clip function is listed below. <fortran> program Clipit

 use rsf
 implicit none
 type (file)                      :: in, out
 integer                          :: n1, n2, i1, i2
 real                             :: clip
 real, dimension (:), allocatable :: trace
 call sf_init()            ! initialize RSF
 in = rsf_input()
 out = rsf_output()
 if (sf_float /= gettype(in)) call sf_error("Need float type")
 call from_par(in,"n1",n1)
 n2 = filesize(in,1)
 call from_par("clip",clip) ! command-line parameter 
 allocate (trace (n1))
 do i2=1, n2                ! loop over traces
    call rsf_read(in,trace)
    
    where (trace >  clip) trace =  clip
    where (trace < -clip) trace = -clip
    call rsf_write(out,trace)
 end do

end program Clipit </fortran>

Let us examine it in detail. <fortran>

 use rsf

</fortran>

The program starts with importing the rsf module. <fortran>

 call sf_init()            ! initialize RSF

</fortran>

A call to sf_init is needed to initialize the command-line interface. <fortran>

 in = rsf_input()
 out = rsf_output()

</fortran>

The standard input and output files are initialized with rsf_input and rsf_output functions. Both functions accept optional arguments. For example, if the command line contains vel=velocity.rsf, then both rsf_input("velocity.rsf") and rsf_input("vel") are acceptable. <fortran>

 if (sf_float /= gettype(in)) call sf_error("Need float type")

</fortran>

A call to from_par extracts the "n1" parameter from the input file. Conceptually, the RSF data model is a multidimensional hypercube. The n1 parameter refers to the fastest axis. If the input dataset is a collection of traces, n1 corresponds to the trace length. We could proceed in a similar fashion, extracting n2, n3, etc. If we are interested in the total number of traces, like in the clip example, a shortcut is to use the filesize function. Calling filesize(in) returns the total number of elements in the hypercube (the product of n1, n2, etc.), calling filesize(in,1) returns the number of traces (the product of n2, n3, etc.), calling filesize(in,2) returns the product of n3, n4, etc. By calling filesize, we avoid the need to extract additional parameters for the hypercube dimensions that we are not interested in. <fortran>

 n2 = filesize(in,1)

</fortran>

The clip parameter is read from the command line, where it can be specified, for example, as clip=10. If we knew a good default value for clip, we could specify it with an optional argument, i.e. call~from_par("clip",clip,default). <fortran>

 allocate (trace (n1))
 do i2=1, n2                ! loop over traces
    call rsf_read(in,trace)
    
    where (trace >  clip) trace =  clip
    where (trace < -clip) trace = -clip

</fortran>

Finally, we do the actual work: loop over input traces, reading, clipping, and writing out each trace.

Compiling

To compile the Fortran-90 program, run

f90 clip.f90 -I$RSFROOT/include -L$RSFROOT/lib -lrsff90 -lrsf -lm

Change f90 to the Fortran-90 compiler appropriate for your system and include additional compiler flags if necessary. The flags that RSF typically uses are in $RSFROOT/lib/rsfconfig.py.

The complete specification for the F90 API can be found on the Library Reference page.

Python interface

The Python clip script is listed below. <python>

  1. !/usr/bin/env python

import numpy import rsf

par = rsf.Par() input = rsf.Input() output = rsf.Output() assert 'float' == input.type

n1 = input.int("n1") n2 = input.size(1) assert n1

clip = par.float("clip") assert clip

trace = numpy.zeros(n1,'f')

for i2 in xrange(n2): # loop over traces

   input.read(trace)
   trace = numpy.clip(trace,-clip,clip)
   output.write(trace)

</python>

Let us examine it in detail. <python> import numpy import rsf </python>

The script starts with importing the numpy and rsf modules. <python> par = rsf.Par() input = rsf.Input() output = rsf.Output() assert 'float' == input.type </python>

Next, we initialize the command line interface and the standard input and output files. We also make sure that the input file type is floating point. <python> n1 = input.int("n1") n2 = input.size(1) assert n1 </python>

We extract the "n1" parameter from the input file. Conceptually, the RSF data model is a multidimensional hypercube. The n1 parameter refers to the fastest axis. If the input dataset is a collection of traces, n1 corresponds to the trace length. We could proceed in a similar fashion, extracting n2, n3, etc. If we are interested in the total number of traces, like in the clip example, a shortcut is to use the size method of the Input class1. Calling size(0) returns the total number of elements in the hypercube (the product of n1, n2, etc.), calling size(1) returns the number of traces (the product of n2, n3, etc.), calling size(2) returns the product of n3, n4, etc. <python> clip = par.float("clip") assert clip </python>

The clip parameter is read from the command line, where it can be specified, for example, as clip=10. <python> for i2 in xrange(n2): # loop over traces

   input.read(trace)
   trace = numpy.clip(trace,-clip,clip)
   output.write(trace)

</python>

Finally, we do the actual work: loop over input traces, reading, clipping, and writing out each trace.

Compiling

The python script does not require compilation. Simply make sure that $RSFROOT/lib is in PYTHONPATH and LD_LIBRARY_PATH.

Interactive mode usage without graphics

Madagascar's Python API can be used interactively too. Create an input dataset with

sfmath n1=10 n2=10 output=x1+x2 > test.rsf

Then, start the python interpreter and paste the following to its command line:

<python> import numpy, rsf

input = rsf.Input('test.rsf') n1 = input.int("n1") n2 = input.int("n2")

data = numpy.zeros((n2,n1),'f') input.read(data) data = data.transpose() # Example of numpy in action

print data </python>

You will get

[[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.]
 [  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.]
 [  2.   3.   4.   5.   6.   7.   8.   9.  10.  11.]
 [  3.   4.   5.   6.   7.   8.   9.  10.  11.  12.]
 [  4.   5.   6.   7.   8.   9.  10.  11.  12.  13.]
 [  5.   6.   7.   8.   9.  10.  11.  12.  13.  14.]
 [  6.   7.   8.   9.  10.  11.  12.  13.  14.  15.]
 [  7.   8.   9.  10.  11.  12.  13.  14.  15.  16.]
 [  8.   9.  10.  11.  12.  13.  14.  15.  16.  17.]
 [  9.  10.  11.  12.  13.  14.  15.  16.  17.  18.]]

This code will also work in batch mode in a Python script, not only pasted to the interpreter's command line.

Graphics with Matplotlib

Python can plot arrays directly from memory, without having to write a file to disk first. Matplotlib is one of the several packages that accomplish this. To create a figure, execute the code in the previous section, followed by:

<python> from pylab import * imshow(data) xlabel('X (m)') ylabel('Y (m)') title('Matplotlib example') </python>

If you want to pop up a figure in an interactive session, after pasting to a Python command line the code shown before, also paste:

<python> show() </python>

You will get Figure 1. The figure will pop up if you run the code in a script too, and the script will stop until the figure is manually closed. You must press the floppy disk button in order to save it. To have the image written to disk automatically, instead of show() use:

<python> savefig('myfile.png') </python>

Putting it all together, here is a sample script reading a RSF file from stdin and printing out a figure:

<python>

  1. !/usr/bin/env python

import rsf, numpy, sys, pylab

input = rsf.Input('test.rsf') n1 = input.int("n1") n2 = input.int("n2")

data = numpy.zeros((n2,n1),'f') input.read(data)

pylab.imshow(data) pylab.savefig('out.png') </python>

MATLAB interface

The MATLAB clip function is listed below. <matlab> function clip(in,out,clip) %CLIP Clip the data

dims = rsf_dim(in); n1 = dims(1);  % trace length n2 = prod(dims(2:end)); % number of traces trace = 1:n1;  % allocate trace rsf_create(out,in)  % create an output file

for i2 = 1:n2  % loop over traces

   rsf_read(trace,in,'same');
   trace(trace >   clip) =  clip;
   trace(trace < - clip) = -clip;
   rsf_write(trace,out,'same');

end

</matlab>

Let us examine it in detail. <matlab> dims = rsf_dim(in); </matlab>

We start by figuring out the input file dimensions. <matlab> n1 = dims(1);  % trace length n2 = prod(dims(2:end)); % number of traces </matlab>

The first dimension is the trace length, the product of all other dimensions correspond to the number of traces. <matlab> trace = 1:n1;  % allocate trace rsf_create(out,in)  % create an output file </matlab>

Next, we allocate the trace array and create an output file. <matlab> for i2 = 1:n2  % loop over traces

   rsf_read(trace,in,'same');
   trace(trace >   clip) =  clip;
   trace(trace < - clip) = -clip;
   rsf_write(trace,out,'same');

end </matlab>

Finally, we do the actual work: loop over input traces, reading, clipping, and writing out each trace.

Available functions

Only some of the functions in the rsf library have received a MATLAB interface. These functions are rsf_par, rsf_dim, rsf_read, rsf_write and rsf_create. All these functions except rsf_par have been illustrated in the example above.

Compiling

The MATLAB script does not require compilation. Simply make sure that $RSFROOT/lib is in MATLABPATH and LD_LIBRARY_PATH.

References

  1. Compare with the library clip program.
  2. Roberts, E. S., 1998, Programming abstractions in C: Addison-Wesley.