{"id":160,"date":"2007-12-27T11:00:41","date_gmt":"2007-12-27T11:00:41","guid":{"rendered":"http:\/\/ahay.org\/blog\/?p=160"},"modified":"2015-08-04T23:51:42","modified_gmt":"2015-08-04T23:51:42","slug":"parallel-processing","status":"publish","type":"post","link":"https:\/\/ahay.org\/blog\/2007\/12\/27\/parallel-processing\/","title":{"rendered":"Parallel processing"},"content":{"rendered":"<p>Many of the data processing operations are <strong>data-parallel<\/strong>: different traces, shot gathers, frequency slices, etc. can be processed independently. <tt>Madagascar<\/tt> provides several mechanisms for handling this type of <em>embarrassingly parallel<\/em> applications on computers with multiple processors.<\/p>\n<ol>\n<li><strong>OpenMP<\/strong><\/li>\n<p><a href=\"http:\/\/en.wikipedia.org\/wiki\/OpenMP\">OpenMP<\/a> is a standard framework for parallel applications on <strong>shared-memory<\/strong> systems. It is supported by the latest versions of <a href=\"http:\/\/gcc.gnu.org\/\">GCC <\/a>and by some other compilers.<br \/>\nTo run a data-parallel processing task like<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf<\/pre><\/div>\n<p>on a shared-memory computer with multiple processors (such as a multi-core PC), try <strong>sfomp<\/strong>, as follows:.<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >sfomp sfradon np=100 p0=0 dp=0.01 < inp.rsf > out.rsf<\/pre><\/div>\n<p><strong>sfomp<\/strong> splits the input along the slowest axis (presumed to be data-parallel) and runs it through parallel threads. The number of threads is set by the <strong>OMP_NUM_THREADS<\/strong> environmental variable or (by default) by the number of available CPUs.<\/p>\n<li><strong>MPI<\/strong><\/li>\n<p><a href=\"http:\/\/www-unix.mcs.anl.gov\/mpi\/\">MPI<\/a> (Message-Passing Interface) is a standard framework for parallel processing on different computer architectures including <strong>distributed-memory<\/strong> systems. Several MPI implementations (such as <a href=\"http:\/\/www-unix.mcs.anl.gov\/mpi\/mpich1\/\">MPICH<\/a>) are available.<br \/>\nTo parallelize a task using MPI, try <strong>sfmpi<\/strong>, as follows:<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >mpirun -np 8 sfmpi sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf<\/pre><\/div>\n<p>where the argument after <b>-np<\/b> specifies the number of processors involved. <strong>sfmpi<\/strong> will use this number to split the input along the slowest axis (presumed to be data-parallel) and to run it through parallel threads.<br \/>\n<em>Note:<\/em> Some MPI implementations do not support system calls implemented in <tt>sfmpi<\/tt> and therefore will not support this option.<\/p>\n<li><strong>MPI + OpenMP<\/strong><\/li>\n<p>It is possible to combine the advantages of shared-memory and distributed-memory architectures by using OpenMP and MPI together.<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >mpirun -np 32 sfmpi sfomp sfradon np=100 p0=0 dp=0.01 input=inp.rsf output=out.rsf<\/pre><\/div>\n<p>will distribute the job on 32 nodes and split it again on each node using shared-memory threads.<\/p>\n<li><strong>SCons<\/strong><\/li>\n<p>If you process data using <a href=\"http:\/\/www.scons.org\/\">SCons<\/a>, another option is available. Change<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >Flow('out','inp','radon np=100 p0=0 dp=0.01')<\/pre><\/div>\n<p>in your <tt>SConstruct<\/tt> file to<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >Flow('out','inp','radon np=100 p0=0 dp=0.01',split=[3,256])<\/pre><\/div>\n<p>where the optional <strong>split=<\/strong> parameter contains the axis that needs to be split and the size of this axis. Then run something like<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >scons -j 8 CLUSTER='localhost 4 node1.utexas.edu 2' out.rsf<\/pre><\/div>\n<p>The <strong>-j<\/strong> options instructs SCons to run in parallel creating 8 threads, while the <strong>CLUSTER=<\/strong> option supplies it with the list of nodes to use and the number of processes to involve for each node. The output may look like<\/p>\n<div class=\"code-box\"><div class=\"code-title\"><i class=\"fa fa-code\"><\/i> <div class=\"pull-right\"><a href=\"#\" class=\"btn btn-default btn-xs toggle-code\" data-toggle=\"tooltip\" title=\"Toggle code\"><i class=\"fa fa-toggle-up\"><\/i><\/a><\/div><\/div><pre >\n< inp.rsf \/RSFROOT\/bin\/sfwindow n3=42 f3=0 squeeze=n > inp__0.rsf\n< inp.rsf \/RSFROOT\/bin\/sfwindow n3=42 f3=42 squeeze=n > inp__1.rsf\n\/usr\/bin\/ssh node1.utexas.edu \"cd \/home\/test ; \/bin\/env < inp.rsf \/RSFROOT\/bin\/sfwindow n3=42 f3=84 squeeze=n > inp__2.rsf \"\n< inp.rsf \/RSFROOT\/bin\/sfwindow n3=42 f3=126 squeeze=n > inp__3.rsf\n< inp.rsf \/RSFROOT\/bin\/sfwindow n3=42 f3=168 squeeze=n > inp__4.rsf\n\/usr\/bin\/ssh node1.utexas.edu \"cd \/home\/test ; \/bin\/env < inp.rsf \/RSFROOT\/bin\/sfwindow f3=210 squeeze=n > inp__5.rsf \"\n< inp__0.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__0.rsf\n\/usr\/bin\/ssh node1.utexas.edu \"cd \/home\/test ; \/bin\/env < inp__1.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__1.rsf \"\n< inp__3.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__3.rsf\n\/usr\/bin\/ssh node1.utexas.edu \"cd \/home\/test ; < spike__4.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__4.rsf \"\n< inp__2.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__2.rsf\n< inp__5.rsf \/RSFROOT\/bin\/sfradon p0=0 np=100 dp=0.01 > out__5.rsf\n< out__0.rsf \/RSFROOT\/bin\/sfcat axis=3 out__1.rsf out__2.rsf out__3.rsf out__4.rsf out__5.rsf > out.rsf\n<\/pre><\/div>\n<p>Splitting the input with <strong>sfwindow<\/strong> and putting the output back together with <strong>sfcat<\/strong> are immediately apparent. The advantage of the SCons-based approach (in addition to documentation and reproducible experiments) is <em>fault tollerance<\/em>: If one of the nodes dies during the process, one should be able to restart the computation without recreating parts that are already computed.<\/ol>\n<p>All these options will continue to evolve and improve with further testing. Please report your experiences and suggestions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many of the data processing operations are data-parallel: different traces, shot gathers, frequency slices, etc. can be processed independently. Madagascar provides several mechanisms for handling this type of embarrassingly parallel applications on computers with multiple processors. OpenMP OpenMP is a standard framework for parallel applications on shared-memory systems. It is supported by the latest versions [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","activitypub_content_warning":"","activitypub_content_visibility":"local","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-examples"],"_links":{"self":[{"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/comments?post=160"}],"version-history":[{"count":1,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/posts\/160\/revisions"}],"predecessor-version":[{"id":755,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/posts\/160\/revisions\/755"}],"wp:attachment":[{"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/media?parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/categories?post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ahay.org\/blog\/wp-json\/wp\/v2\/tags?post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}