Madagascar Development Blog » How to make RSF files easy to fetch from FTP servers?

An RSF “file” typically consists of 2 parts: the binary file, where data values are stored, and the header file (text), where data attributes are put (see RSF file format page for more details). To prepare such file for distribution, it is recommended to 1) make the binary part architecture independent so that data can be downloaded on heterogeneous computer systems, and 2) pack the binary and header parts. Such result is achieved by using the sfdd program to convert the binary part to XDR format and, at the same time, to pack the two parts. See the example below.

Example:

bash$ sfspike n1=100 n2=100 >myfile.rsf
bash$ sfin myfile.rsf
myfile.rsf:
in="/tmp/myfile.rsf@"
esize=4 type=float form=native
n1=100         d1=0.004       o1=0          label1="Time" unit1="s"
n2=100         d2=0.1           o2=0          label2="Distance" unit2="km"
10000 elements 40000 bytes

The binary part of myfile.rsf is in the DATAPATH, i.e. /tmp, and format is native. To make myfile.rsf easy to fetch from your FTP server, run the following command

bash$ <myfile.rsf sfdd form=xdr out=stdout >xdr_packed_file.rsf
bash$ sfin xdr_packed_file.rsf
xdr_packed_file.rsf:
in="stdin"
esize=4 type=float form=xdr
n1=100         d1=0.004       o1=0          label1="Time" unit1="s"
n2=100         d2=0.1           o2=0          label2="Distance" unit2="km"
10000 elements 40000 bytes

The file is now packed, i.e. in=”stdin”, and format is xdr. It is ready to go!
On the receiver side, i.e. computer which fetches this xdr_packed_file.rsf from your FTP server, it is recommended to start by unpacking and converting to native format the file before manipulating the data. This operation is also done using the sfdd program. The command is

bash$ <xdr_packed_file.rsf sfdd form=native >myfile.rsf

Markdown	Result
text	text
text	text
*text*	text
`code`	`code`
~~~ more code ~~~~	more code
[Link](http://www.example.com)	Link
* Listitem	Listitem
> Quote	Quote

Markdown

Result

*text*

text

**text**

text

***text***

text

`code`

code

~~~
more code
~~~~

more code

[Link](http://www.example.com)

Link

* Listitem

Listitem

> Quote

Quote

Nick Vlad 18 years ago

Compression can be a solution for big downloads, but its effectiveness depends on the input data.
Works fantastically well with a file full of zeros, I got a compression ratio of 1:1000 with gzip!
Gzip is mediocre for time-domain seismic data. I got a 1:2.1 compression ratio on a 45 Gb file, and compression took 51 minutes, while just a copy of the output was 9 minutes. Both to the same disk as the input, no network issues. So 42 extra minutes for gzip. This is not a scientific experiment, as the OS uses the CPU too, I should have really repeated the experiment, done a histogram and got the most probable value, etc. But you get the idea. The same 1:2.1 ratio is advertised for other lossless compression methods like FLAC ( http://en.wikipedia.org/wiki/FLAC ).
Gzip was terrible on a 1.5 Gb frequency-domain dataset (exactly what you may care about sending to a cluster for a migration), it only packed it down to 1.4 Gb. .Probably because the freq-domain seismic data is less statistically predictible than time-domain data.
I did not use any flags for gzip compression level, used the default (6). I saw studies elsewhere showing that it did not make much of a difference. Bzip2 takes 10 times longer to process for an improvement of 10-15% in file size. Would be a catastrophe on a huge dataset.
Lossy compression methods (wavelets, etc) do much better, but then you start wondering what you lost, especially if lossy compression was applied several times in the workflow, and usually that’s not worth the trouble.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28

How to make RSF files easy to fetch from FTP servers?

One comment

Archives