Parallel Computing: Difference between revisions

From Madagascar
Jump to navigation Jump to search
Nick (talk | contribs)
Angeliu (talk | contribs)
 
(30 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Image:Cluster.jpg|right|frame|[http://www.freedigitalphotos.net/images/view_photog.php?photogid=1152 Image: jscreationzs / FreeDigitalPhotos.net]]]
Many of the data processing operations are '''data-parallel''': different traces, shot gathers, frequency slices, etc. can be processed independently. Madagascar provides several mechanisms for handling this type of embarrassingly parallel applications on computers with multiple processors.  
Many of the data processing operations are '''data-parallel''': different traces, shot gathers, frequency slices, etc. can be processed independently. Madagascar provides several mechanisms for handling this type of embarrassingly parallel applications on computers with multiple processors.  


==OpenMP (internal)==
==OpenMP and MPI==
 
===OpenMP (internal)===
[https://secure.wikimedia.org/wikipedia/en/wiki/OpenMP OpenMP] is a standard framework for parallel applications on '''shared-memory''' systems. It is supported by the latest versions of [http://gcc.gnu.org/ GCC] and by some other compilers.
[https://secure.wikimedia.org/wikipedia/en/wiki/OpenMP OpenMP] is a standard framework for parallel applications on '''shared-memory''' systems. It is supported by the latest versions of [http://gcc.gnu.org/ GCC] and by some other compilers.


To use OpenMP in your program, you do not need to add anything to your SConstruct. Just make sure the OMP libraries are installed on your system before you configure Madagascar, (or -- reinstall them and rerun the configuration command). Of course, you need to use the appropriate pragmas in your code. To find Madagascar programs that use OpenMP and that you can take as a model, run the following command:
To use OpenMP in your program, you do not need to add anything to your SConstruct. Just make sure the OMP libraries are installed on your system before you configure Madagascar, (or -- reinstall them and rerun the configuration command). Of course, you need to use the appropriate pragmas in your code. To find Madagascar programs that use OpenMP and that you can take as a model, run the following command:


<bash>
<syntaxhighlight lang="bash">
grep "pragma omp" $RSFSRC/user/*/*.c |\
grep "pragma omp" $RSFSRC/*/*/M*.c |\
awk -F ':' '{ print $1 }' |\
awk -F ':' '{ print $1 }' |\
uniq |\
uniq |\
awk -F '/' '{ print $NF }' |\
awk -F '/' '{ print $NF }'  
grep M
</syntaxhighlight>
</bash>
On the last check (2014-02-09), 139 standalone programs (approximately 11% of Madagascar programs) were using OMP. Running a similar command in the directory <tt>$RSFSRC/api/c</tt> will yield a few library functions parallelized with OMP.
On the last check (2011-08-10), 84 standalone programs (approximately 10% of Madagascar programs) were using OMP. Running this command in the directory <tt>$RSFSRC/api/c</tt> will yield a few functions parallelized with OMP (among which a Fourier Transform).


==OpenMP (external)==
===OpenMP (external)===


To run on a multi-core shared-memory machine a data-parallel process that does not contain OpenMP calls, use <tt>sfomp</tt>. Thus, a call like
To run on a multi-core shared-memory machine a data-parallel process that does not contain OpenMP calls, use <tt>sfomp</tt>. Thus, a call like
<bash>
<syntaxhighlight lang="bash">
sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf
sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf
</bash>
</syntaxhighlight>
becomes
becomes
<bash>
<syntaxhighlight lang="bash">
sfomp sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf
sfomp sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf
</bash>
</syntaxhighlight>
<tt>sfomp</tt> splits the input along the slowest axis (presumed to be data-parallel) and runs it through parallel threads. The number of threads is set by the <tt>OMP_NUM_THREADS</tt> environmental variable or (by default) by the number of available CPUs.
<tt>sfomp</tt> splits the input along the slowest axis (presumed to be data-parallel) and runs it through parallel threads. The number of threads is set by the <tt>OMP_NUM_THREADS</tt> environmental variable or (by default) by the number of available CPUs. For example,
<syntaxhighlight lang="bash">
export OMP_NUM_THREADS=number of threads
</syntaxhighlight>


==MPI (internal)==
===MPI (internal)===
[http://www.mcs.anl.gov/research/projects/mpi/ MPI] (Message-Passing Interface) is the dominant standard framework for parallel processing on different computer architectures including '''distributed-memory''' systems. Several MPI implementations (such as [http://www.open-mpi.org/ Open MPI] and [http://www.mcs.anl.gov/research/projects/mpich2/ MPICH2]) are available.
[http://www.mcs.anl.gov/research/projects/mpi/ MPI] (Message-Passing Interface) is the dominant standard framework for parallel processing on different computer architectures including '''distributed-memory''' systems. Several MPI implementations (such as [http://www.open-mpi.org/ Open MPI] and [http://www.mcs.anl.gov/research/projects/mpich2/ MPICH2]) are available.


An example of compiling a program with <tt>mpicc</tt> and running it under <tt>mpirun</tt> can be found in [http://rsf.svn.sourceforge.net/viewvc/rsf/trunk/book/rsf/bash/mpi/SConstruct?view=markup $RSFSRC/book/rsf/bash/mpi/SConstruct]
An example of compiling a program with <tt>mpicc</tt> and running it under <tt>mpirun</tt> can be found in [http://www.ahay.org/RSF/book/rsf/bash/mpi.html $RSFSRC/book/rsf/bash/mpi/SConstruct].  Note that Madagascar has a requirement that all internally-executing MPI programs must contain string 'mpi' in the program name as it is needed for SCons to switch to a mpi compiler such as mpicc.


==MPI (external)==
===MPI (external)===
To parallelize a task using MPI but without including MPI calls in your source code, try <tt>sfmpi</tt>, as follows:
To parallelize a data-parallel task using MPI but without including MPI calls in your source code, try <tt>sfmpi</tt>, as follows:
<bash>
<syntaxhighlight lang="bash">
mpirun -np 8 sfmpi sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf
mpirun -np 8 sfmpi sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf split=2
</bash>
</syntaxhighlight>
where the argument after <tt>-np</tt> specifies the number of processors involved. sfmpi will use this number to split the input along the slowest axis (presumed to be data-parallel) and to run it through parallel threads. Notice that the keywords <tt>input</tt> and <tt>output</tt> are specific to <tt>sfmpi</tt> and they will be used to specify the standard input and output streams of your program.
where the argument after <tt>-np</tt> specifies the number of processors involved. sfmpi will use this number to split the input along the slowest axis (presumed to be data-parallel) and to run it through parallel threads. Notice that the keywords <tt>input</tt>, <tt>output</tt>, and <tt>split</tt> are specific to <tt>sfmpi</tt>. They are used to specify the standard input and output streams of your program and the input axis to split.


Some MPI implementations do not support system calls implemented in sfmpi and therefore will not support this feature.
Some older MPI implementations do not support system calls implemented in <tt>sfmpi</tt> and therefore may not support this feature.


==MPI + OpenMP (both external)==
===MPI + OpenMP (both external)===


It is possible to combine the advantages of shared-memory and distributed-memory architectures by using OpenMP and MPI together.
It is possible to combine the advantages of shared-memory and distributed-memory architectures by using OpenMP and MPI together.
<bash>
<syntaxhighlight lang="bash">
mpirun -np 32 sfmpi sfomp sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf
mpirun -np 32 sfmpi sfomp sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf
</bash>
</syntaxhighlight>
will distribute the job on 32 nodes and split it again on each node using shared-memory threads.
will distribute the job on 32 nodes and split it again on each node using shared-memory threads.


==pscons==
==pscons==
To get SCons to cut your inputs into slices, run in parallel on one multi-cpu workstation or on multiple cluster nodes and then collect, use the <tt>pscons</tt> wrapper to <tt>scons</tt>. Just running pscons with no special environment variable set is equivalent to running <tt>scons -j nproc</tt>, where <tt>nproc</tt> is the auto-detected number of threads on your system. By setting
To get SCons to cut your inputs into slices, run in parallel on one multi-cpu workstation or on multiple cluster nodes and then collect, use the <tt>pscons</tt> wrapper to <tt>scons</tt>. Unlike the OpenMP or MPI utilities, this has fault tolerance -- in case of a node failing, restarting the job will allow it to complete.


Simply running pscons with no special environment variable set is equivalent to running <tt>scons -j nproc</tt>, where <tt>nproc</tt> is the auto-detected number of threads on your system. To fully use the potential of <tt>pscons</tt> for running on a distributed-memory computer, you need to set the environment variables <tt>RSF_CLUSTER</tt> and <tt>RSF_THREADS</tt>, and to use <tt>split</tt> and <tt>reduce</tt> arguments in your SConstruct Flow statements where appropriate.


Several functionalities have been added in Madagascar for parallel computing on clusters with distributed memory.
===Setting the environment variables and how to run===
The <tt>SConstruct</tt> files have to be run with '''pscons''' instead of <tt>scons</tt>. The command '''pscons''' is a wrapper for
the use of SCons with the option -j.
The environment variables <tt>&#36;RSF_THREADS</tt> and <tt>&#36;RSF_CLUSTER</tt> respectively provide to '''pscons''' the number of threads
and the address list of the nodes you want to use for your computation.


=== Computing on the local node only by using the option local=1 ===
The <tt>RSF_CLUSTER</tt> variable holds, for each node, the name or IP address of that node (in a format that can be used by ssh), followed by the number of threads on the node. For example, creating 26 threads and sending them on 4 nodes, using respectively 6 CPUs on the first node, 4 CPUs on the second, and 8 CPUs on each of the last two nodes:
<syntaxhighlight lang="bash">
export RSF_CLUSTER='140.168.1.236 6 140.168.1.235 4 140.168.1.234 8 140.168.1.233 8'
</syntaxhighlight>
 
The <tt>RSF_THREADS</tt> variable holds the sum of the numbers of threads on all nodes, i.e.:
<syntaxhighlight lang="bash">
export RSF_THREADS=26
</syntaxhighlight>
If <tt>RSF_CLUSTER</tt> is not defined, <tt>RSF_THREADS</tt> can be used to override the auto-detected number of threads used on the local host. This can be useful in the case of processes using a large amount of memory.


By default, with '''pscons''', SCons wants to run all the commands of the <tt>SConstruct</tt> file in parallel.
In Beowulf-type clusters in which communication of the processor with the local disk is much faster than with the shared network storage, it is important to set in the shell resource file the temporary file location to a local disk, and the <tt>DATAPATH</tt> variable to a network-visible location for global collection of results, i.e.:
The option '''local=1''' forces SCons to compute locally. It can be very useful in order to prevent serial
parts of your python script to be run inefficiently in parallel.


<python>
<syntaxhighlight lang="bash">
Flow('spike',None,'spike n1=100 n2=300 n3=1000',local=1)
export DATAPATH=/disk1/data/myname/
</python>
export TMPDATAPATH=/tmp/
</syntaxhighlight>


=== Computing on the nodes of the cluster specified by the environment variable <tt>&#36;RSF_CLUSTER</tt> ===
To execute using this method, one can then use the command <tt>pscons</tt> or avoid specifying the environment variables altogether by using,
<syntaxhighlight lang="bash">
scons -j 26 CLUSTER='140.168.1.236 6 140.168.1.235 4 140.168.1.234 8 140.168.1.233 8'
</syntaxhighlight>.


<python>
===Parallel Flow() using split and reduce===
The split option specifies the number of the axis to be split and the size of that axis. For an axis 3 of length 1000 on the standard in file, and collection by concatenation:
<syntaxhighlight lang="python">
Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04',split=[3,1000],reduce='cat')
Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04',split=[3,1000],reduce='cat')
</python>
</syntaxhighlight>
Concatenation on the same axis as specified by <tt>split=</tt> is the default reduction method. Possible other valid options are <tt>reduce='add'</tt>, <tt>reduce='cat axis=1'</tt>, etc. Examples can be found in [http://www.ahay.org/RSF/book/rsf/school/data.html $RSFSRC/book/rsf/school/data/SConstruct] and $RSFSRC/book/trip/pscons/SConstruct.
 
If flows that are run by <tt>pscons</tt> contain both serial and parallel targets, care must be exercised in order to not create bottlenecks, in which tasks are distributed to multiple nodes, but the nodes sit idle while waiting for other nodes to finish computing dependencies. Tasks that are not explicitly parallelized will be sped up by <tt>pscons</tt> if they are independent from each other. For example, compiling Madagascar itself with <tt>pscons</tt> instead of scons results in a visible speedup on a multithreaded machine.


The option '''split''' instructs '''Flow''' to split the input file along the third axis of length 1000.
=== Computing on the local node only by using the option local=1 ===
If you have several source files and want to split only some of them, say the first and the third one, the option to use will be '''split'''=[3,1000,[0,2]]'''.


If we choose <tt>&#36;RSF_THREADS</tt>=26, we obtain, as an itermediate result in the local directory, the files
By default, with '''pscons''', SCons attempts to run all the commands of the <tt>SConstruct</tt> file in parallel.
<tt>spike__0.rsf, spike__1.rsf, ..., spike__25.rsf,</tt> which are sent and distributed for computation on the different nodes
The option '''local=1''' forces SCons to compute locally on the head node of the cluster. It can be useful for preventing serial
specified by <tt>&#36;RSF_CLUSTER</tt>.
parts of your python script to be distributed across multiple nodes.
After the parallel computation on the nodes, the resulting files
<syntaxhighlight lang="python">
<tt>radon__0.rsf, radon__1.rsf, ..., radon__25.rsf</tt>, are recombined together to create the output <tt>radon.rsf</tt>.
Flow('spike',None,'spike n1=100 n2=300 n3=1000',local=1)
The parameter '''reduce''' selects the type of recombination. Two typical options are '''reduce'''='cat' or '''reduce'''='add'.
</syntaxhighlight>


=== Computing in parallel without using any option ===
===What to expect at runtime===
SCons will create intermediate input and output slices in the current directory. For example, for
<syntaxhighlight lang="bash">
Flow('out','inp','radon np=100 p0=0 dp=0.01',split=[3,256])
</syntaxhighlight>
and
<syntaxhighlight lang="bash">
RSF_THREADS=8
RSF_CLUSTER='localhost 4 node1.utexas.edu 4'
</syntaxhighlight>
the SCons output will look like:
<syntaxhighlight lang="bash">
< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=0 squeeze=n > inp__0.rsf


This choice is appropriate when you write a python loop in your program
< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=42 squeeze=n > inp__1.rsf
and want it to be run in parallel. This is a way, as well, to speed up sequential parts of your program.
However, the user should make judicious decisions as it can have the opposite effect.
Indeed, in a serial part of the program, the second command has to wait for the first to finish the run on a different node and to communicate it.


<python>
/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=84 squeeze=n > inp__2.rsf "
Flow('spike',None,'spike n1=100 n2=300 n3=1000')
Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04')
</python>


== Setting the environment variables ==
< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=126 squeeze=n > inp__3.rsf


In our example, we used 26 threads and send them on 4 nodes, using
< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=168 squeeze=n > inp__4.rsf
respectively 6 CPUs on the first node, 4 CPUs on the second, and 8 CPUs
on each of the last two nodes.


<bash>
/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp.rsf /RSFROOT/bin/sfwindow f3=210 squeeze=n > inp__5.rsf "
export RSF_THREADS=26
 
export RSF_CLUSTER='140.168.1.236 6 140.168.1.235 4 140.168.1.234 8 140.168.1.233 8'
< inp__0.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__0.rsf
</bash>
 
/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp__1.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__1.rsf "
 
< inp__3.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__3.rsf


One important setting is to properly manage the temporary files location specified by <tt>&#36;TMPDATAPATH</tt>
/usr/bin/ssh node1.utexas.edu "cd /home/test ; < inp__4.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__4.rsf "
and the data storage location specified by <tt>&#36;DATAPATH</tt> .
The temporary files used during the computation have to be stored locally on each node to avoid too much communication
between the hard disks and the nodes.
The paths will depend on your cluster and you can set them in your <tt>.bashrc</tt> file, for example:


<bash>
< inp__2.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__2.rsf
export DATAPATH=/disk1/data/myname/
export TMPDATAPATH=/tmp/
</bash>


== Run ==
< inp__5.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__5.rsf


Once your <tt>SConstruct</tt> file is ready and your environment variables are set,
< out__0.rsf /RSFROOT/bin/sfcat axis=3 out__1.rsf out__2.rsf out__3.rsf out__4.rsf out__5.rsf > out.rsf
you can use the following suggested procedure.
</syntaxhighlight>
It has been tested and is currently used on a linux cluster.


* Make sure the disk located at <tt>&#36;DATAPATH</tt> is mounted on the different nodes.
Note that operations were sent for execution in parallel, but the display is necessarily serial.
* Test if there is enough space available on the different nodes of the cluster at the location specified by <tt>&#36;TMPDATAPATH</tt>. This directory may be filled up, if some jobs have been interrupted. Clean this up if necessary.
* Look at what is going on on your cluster with '''sftop'''.
* Everything looks good ? Then go and run '''pscons''' instead of <tt>scons</tt>.
* If you need to kill your processes on the cluster, the command '''sfkill''' can do it remotely on all the nodes for a specific job command. If you kill your jobs, check it did not filled up the <tt>&#36;TMPDATAPATH</tt> with temporary files before you run '''pscons''' again.


One nice feature of running SCons on clusters is fault tolerance (see [http://www.reproducibility.org/rsflog/index.php?/archives/160-Parallel-processing.html relevant blog post]).
Runtime job monitoring can be achieved with '''sftop'''. To kill a distributed job, use '''sfkill'''.

Latest revision as of 13:53, 7 January 2023

Image: jscreationzs / FreeDigitalPhotos.net

Many of the data processing operations are data-parallel: different traces, shot gathers, frequency slices, etc. can be processed independently. Madagascar provides several mechanisms for handling this type of embarrassingly parallel applications on computers with multiple processors.

OpenMP and MPI[edit]

OpenMP (internal)[edit]

OpenMP is a standard framework for parallel applications on shared-memory systems. It is supported by the latest versions of GCC and by some other compilers.

To use OpenMP in your program, you do not need to add anything to your SConstruct. Just make sure the OMP libraries are installed on your system before you configure Madagascar, (or -- reinstall them and rerun the configuration command). Of course, you need to use the appropriate pragmas in your code. To find Madagascar programs that use OpenMP and that you can take as a model, run the following command:

grep "pragma omp" $RSFSRC/*/*/M*.c |\
awk -F ':' '{ print $1 }' |\
uniq |\
awk -F '/' '{ print $NF }'

On the last check (2014-02-09), 139 standalone programs (approximately 11% of Madagascar programs) were using OMP. Running a similar command in the directory $RSFSRC/api/c will yield a few library functions parallelized with OMP.

OpenMP (external)[edit]

To run on a multi-core shared-memory machine a data-parallel process that does not contain OpenMP calls, use sfomp. Thus, a call like

sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf

becomes

sfomp sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf

sfomp splits the input along the slowest axis (presumed to be data-parallel) and runs it through parallel threads. The number of threads is set by the OMP_NUM_THREADS environmental variable or (by default) by the number of available CPUs. For example,

export OMP_NUM_THREADS=number of threads

MPI (internal)[edit]

MPI (Message-Passing Interface) is the dominant standard framework for parallel processing on different computer architectures including distributed-memory systems. Several MPI implementations (such as Open MPI and MPICH2) are available.

An example of compiling a program with mpicc and running it under mpirun can be found in $RSFSRC/book/rsf/bash/mpi/SConstruct. Note that Madagascar has a requirement that all internally-executing MPI programs must contain string 'mpi' in the program name as it is needed for SCons to switch to a mpi compiler such as mpicc.

MPI (external)[edit]

To parallelize a data-parallel task using MPI but without including MPI calls in your source code, try sfmpi, as follows:

mpirun -np 8 sfmpi sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf split=2

where the argument after -np specifies the number of processors involved. sfmpi will use this number to split the input along the slowest axis (presumed to be data-parallel) and to run it through parallel threads. Notice that the keywords input, output, and split are specific to sfmpi. They are used to specify the standard input and output streams of your program and the input axis to split.

Some older MPI implementations do not support system calls implemented in sfmpi and therefore may not support this feature.

MPI + OpenMP (both external)[edit]

It is possible to combine the advantages of shared-memory and distributed-memory architectures by using OpenMP and MPI together.

mpirun -np 32 sfmpi sfomp sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf

will distribute the job on 32 nodes and split it again on each node using shared-memory threads.

pscons[edit]

To get SCons to cut your inputs into slices, run in parallel on one multi-cpu workstation or on multiple cluster nodes and then collect, use the pscons wrapper to scons. Unlike the OpenMP or MPI utilities, this has fault tolerance -- in case of a node failing, restarting the job will allow it to complete.

Simply running pscons with no special environment variable set is equivalent to running scons -j nproc, where nproc is the auto-detected number of threads on your system. To fully use the potential of pscons for running on a distributed-memory computer, you need to set the environment variables RSF_CLUSTER and RSF_THREADS, and to use split and reduce arguments in your SConstruct Flow statements where appropriate.

Setting the environment variables and how to run[edit]

The RSF_CLUSTER variable holds, for each node, the name or IP address of that node (in a format that can be used by ssh), followed by the number of threads on the node. For example, creating 26 threads and sending them on 4 nodes, using respectively 6 CPUs on the first node, 4 CPUs on the second, and 8 CPUs on each of the last two nodes:

export RSF_CLUSTER='140.168.1.236 6 140.168.1.235 4 140.168.1.234 8 140.168.1.233 8'

The RSF_THREADS variable holds the sum of the numbers of threads on all nodes, i.e.:

export RSF_THREADS=26

If RSF_CLUSTER is not defined, RSF_THREADS can be used to override the auto-detected number of threads used on the local host. This can be useful in the case of processes using a large amount of memory.

In Beowulf-type clusters in which communication of the processor with the local disk is much faster than with the shared network storage, it is important to set in the shell resource file the temporary file location to a local disk, and the DATAPATH variable to a network-visible location for global collection of results, i.e.:

export DATAPATH=/disk1/data/myname/
export TMPDATAPATH=/tmp/

To execute using this method, one can then use the command pscons or avoid specifying the environment variables altogether by using,

scons -j 26 CLUSTER='140.168.1.236 6 140.168.1.235 4 140.168.1.234 8 140.168.1.233 8'

.

Parallel Flow() using split and reduce[edit]

The split option specifies the number of the axis to be split and the size of that axis. For an axis 3 of length 1000 on the standard in file, and collection by concatenation:

Flow('radon','spike','radon adj=y p0=-4 np=200 dp=0.04',split=[3,1000],reduce='cat')

Concatenation on the same axis as specified by split= is the default reduction method. Possible other valid options are reduce='add', reduce='cat axis=1', etc. Examples can be found in $RSFSRC/book/rsf/school/data/SConstruct and $RSFSRC/book/trip/pscons/SConstruct.

If flows that are run by pscons contain both serial and parallel targets, care must be exercised in order to not create bottlenecks, in which tasks are distributed to multiple nodes, but the nodes sit idle while waiting for other nodes to finish computing dependencies. Tasks that are not explicitly parallelized will be sped up by pscons if they are independent from each other. For example, compiling Madagascar itself with pscons instead of scons results in a visible speedup on a multithreaded machine.

Computing on the local node only by using the option local=1[edit]

By default, with pscons, SCons attempts to run all the commands of the SConstruct file in parallel. The option local=1 forces SCons to compute locally on the head node of the cluster. It can be useful for preventing serial parts of your python script to be distributed across multiple nodes.

Flow('spike',None,'spike n1=100 n2=300 n3=1000',local=1)

What to expect at runtime[edit]

SCons will create intermediate input and output slices in the current directory. For example, for

Flow('out','inp','radon np=100 p0=0 dp=0.01',split=[3,256])

and

RSF_THREADS=8
RSF_CLUSTER='localhost 4 node1.utexas.edu 4'

the SCons output will look like:

< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=0 squeeze=n > inp__0.rsf

< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=42 squeeze=n > inp__1.rsf

/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=84 squeeze=n > inp__2.rsf "

< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=126 squeeze=n > inp__3.rsf

< inp.rsf /RSFROOT/bin/sfwindow n3=42 f3=168 squeeze=n > inp__4.rsf

/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp.rsf /RSFROOT/bin/sfwindow f3=210 squeeze=n > inp__5.rsf "

< inp__0.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__0.rsf

/usr/bin/ssh node1.utexas.edu "cd /home/test ; /bin/env < inp__1.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__1.rsf "

< inp__3.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__3.rsf

/usr/bin/ssh node1.utexas.edu "cd /home/test ; < inp__4.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__4.rsf "

< inp__2.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__2.rsf

< inp__5.rsf /RSFROOT/bin/sfradon p0=0 np=100 dp=0.01 > out__5.rsf

< out__0.rsf /RSFROOT/bin/sfcat axis=3 out__1.rsf out__2.rsf out__3.rsf out__4.rsf out__5.rsf > out.rsf

Note that operations were sent for execution in parallel, but the display is necessarily serial.

Runtime job monitoring can be achieved with sftop. To kill a distributed job, use sfkill.