An RSF “file” typically consists of 2 parts: the binary file, where data values are stored, and the header file (text), where data attributes are put (see RSF file format page for more details). To prepare such file for distribution, it is recommended to 1) make the binary part architecture independent so that data can be downloaded on heterogeneous computer systems, and 2) pack the binary and header parts. Such result is achieved by using the sfdd program to convert the binary part to XDR format and, at the same time, to pack the two parts. See the example below.
Example:
bash$ sfspike n1=100 n2=100 >myfile.rsf bash$ sfin myfile.rsf myfile.rsf: in="/tmp/myfile.rsf@" esize=4 type=float form=native n1=100 d1=0.004 o1=0 label1="Time" unit1="s" n2=100 d2=0.1 o2=0 label2="Distance" unit2="km" 10000 elements 40000 bytes
The binary part of myfile.rsf is in the DATAPATH, i.e. /tmp, and format is native. To make myfile.rsf easy to fetch from your FTP server, run the following command
bash$ <myfile.rsf sfdd form=xdr out=stdout >xdr_packed_file.rsf bash$ sfin xdr_packed_file.rsf xdr_packed_file.rsf: in="stdin" esize=4 type=float form=xdr n1=100 d1=0.004 o1=0 label1="Time" unit1="s" n2=100 d2=0.1 o2=0 label2="Distance" unit2="km" 10000 elements 40000 bytes
The file is now packed, i.e. in=”stdin”, and format is xdr. It is ready to go!
On the receiver side, i.e. computer which fetches this xdr_packed_file.rsf from your FTP server, it is recommended to start by unpacking and converting to native format the file before manipulating the data. This operation is also done using the sfdd program. The command is
Compression can be a solution for big downloads, but its effectiveness depends on the input data.
Works fantastically well with a file full of zeros, I got a compression ratio of 1:1000 with gzip!
Gzip is mediocre for time-domain seismic data. I got a 1:2.1 compression ratio on a 45 Gb file, and compression took 51 minutes, while just a copy of the output was 9 minutes. Both to the same disk as the input, no network issues. So 42 extra minutes for gzip. This is not a scientific experiment, as the OS uses the CPU too, I should have really repeated the experiment, done a histogram and got the most probable value, etc. But you get the idea. The same 1:2.1 ratio is advertised for other lossless compression methods like FLAC ( http://en.wikipedia.org/wiki/FLAC ).
Gzip was terrible on a 1.5 Gb frequency-domain dataset (exactly what you may care about sending to a cluster for a migration), it only packed it down to 1.4 Gb. .Probably because the freq-domain seismic data is less statistically predictible than time-domain data.
I did not use any flags for gzip compression level, used the default (6). I saw studies elsewhere showing that it did not make much of a difference. Bzip2 takes 10 times longer to process for an improvement of 10-15% in file size. Would be a catastrophe on a huge dataset.
Lossy compression methods (wavelets, etc) do much better, but then you start wondering what you lost, especially if lossy compression was applied several times in the workflow, and usually that’s not worth the trouble.