Systems

Vplot figures and MS Word

March 16, 2016 Systems No comments

Joe Dellinger, the author of Vplot, suggests adjusting parameters for raster figures when including them in Word documents. He writes:

Wow, working on my SEG abstract I had a helluva time getting my vplot raster figures to look decent in word. Then I realized… wait a minute, it’s doing just the bad things plotters back in the 80’s were doing. I fiddled a little with pixc and greyc, and voila! Beautiful raster figures.

From the Vplot documentation:

  • pixc is used only when dithering is being performed, and also should only be used for hardcopy devices. It alters the grey scale to correct for pixel overlap on the device, which (if uncorrected) causes grey raster images to come out much darker on paper than on graphics displays.

  • greyc is used only when dithering is being performed, and really should only be used for hardcopy devices. It alters the grey scale so that grey rasters come out on paper with the same nonlinear appearance that is perceived on display devices.

The default values are pixc=1 greyc=1. The values used by Joe in his Word document were pixc=1.15 greyc=1.25.

To convert Vplot plots to other forms of graphics, you can use vpconvert.

See also:

Continuous reproducibility using CircleCI

February 20, 2016 Systems No comments

Continuous Integration (CI) is a powerful discipline of software engineering, which involves a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system which includes testing scripts.

As previously suggested, CI tools can be easily adopted to perform continuous reproducibility: repeatedly testing if previously reproducible results remain reproducibe after software changes. Continuous reproducibility can assure that reproducible documents stay “alive” and continue to be usable.

Numerous tools have appeared in recent years to offer CI services in the cloud: Travis CI, Semaphore, Codeship, Shippable, etc. It is hard to choose one. I would pick CircleCI. CircleCI is developed by a startup company from San Francisco. Its product is not fundamentally different from analogous services but provides a solid implementation, which includes:

  • Integration with GitHub
  • SSH access
  • Sleek user interface
  • Simple configuration via circle.yml file
  • Fast parallel execution

Let us test if it can serve as a good platform for Madagascar’s continuous reproducibility.

Julia

September 28, 2015 Systems No comments

Julia is a new open-source programming/scripting language designed for high-performance scientific computing. The goal is to combine the simplicity of Python with the performance approaching that of statically-compiled languages like C.

Julia has a number of other attractive features including:

  • Dynamic type system
  • Powerful shell-like capabilities for managing other processes
  • Designed for parallelism and distributed computation
  • Automatic generation of efficient, specialized code for different argument types
  • Elegant and extensible conversions and promotions for numeric and other types

A simple interface to Julia has been added to Madagascar. It can be easily extended to include other functions from the Madagascar library. An example test script is shown below:

#!/usr/bin/env julia

using m8r

m8r.init()
inp = m8r.input("in")
out = m8r.output("out")

n1 = m8r.histint(inp,"n1")
n2 = m8r.leftsize(inp,1)

clip = m8r.getfloat("clip")

trace = Array(Float32,n1)

for i2 in 1:n2
    m8r.floatread(trace,n1,inp)
    trace = clamp(trace,-clip,clip)
    m8r.floatwrite(trace,n1,out)
end

Compare it with scripts or programs in other languages.

More colormaps

July 12, 2015 Systems No comments

The most popular colormap in Madagascar, other than the default greyscale, is color=j, modeled after “jet“, which used to be the default colormap in MATLAB. More than 1,000 Madagascar examples use color=j. In October 2014, with release R2014b (Version 8.4), MATLAB switched the default colormap to a different one, called “parula“. The “parula” colormap is copyrighted by MathWorks as a result of a creative process (solving an optimization problem). No open-source license is given to use it outside of MATLAB. According to Steve Eddins, “this colormap is MathWorks intellectual property, and it would not be appropriate or acceptable to copy or re-use it in non-MathWorks plotting tools.” Stéfan van der Walt and Nathaniel Smith from the Berkeley Institute for Data Science have developed several new open-source colormaps with good perceptual properties. One of them (named “viridis“) is proposed as a good replacement for “jet” and as the default colormap in matplotlib 2.0. Is it a good colormap? We can find out by using tools from Matteo Niccoli’s tutorial on colormaps. This analysis shows the intensity and lightness distributions of “viridis” are nicely linear. In his presentation at SciPy-2015, Nathaniel Smith explains the rational for this choice.

Reproducible research and PDF files

June 21, 2015 Systems No comments

Claerbout’s principle of reproducible research, as formulated by Buckheit and Donoho (1995), states:

An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

The geophysics class in the SEGTeX package features a new option: reproduce, which attaches SConstruct files or other appropriate code (Matlab scripts, Python scripts, etc.) directly to the PDF file of the paper, with a button under every reproducible figure for opening the corresponding script. Unfortunately, not every PDF viewer supports this kind of links. The screenshot below shows evince viewer on Linux, where clicking the button opens the file with gedit editor.

Literate programming with IPython notebooks

June 2, 2015 Systems No comments

Donald KnuthLiterate programming is a concept promoted by Donald Knuth, the famous computer scientist (and the author of the Art of Computer Programming.) According to this concept, computer programs should be written in a combination of the programming language (the usual source code) and the natural language, which explains the logic of the program.

When it comes to scientific programming, using comments for natural-language explanations is not always convenient. Moreover, it is limited, because such explanations may require figures, equations, and other common elements of scientific texts. IPython/Jupyter notebooks provide a convenient tool for combining different text elements with code. See the notebook at https://github.com/sfomel/ipython/blob/master/LiterateProgramming.ipynb for an example on how to implement literate programming using an IPython notebook with reproducible SConstruct data-analysis workflows in Madagascar.

Related posts:
* Madagascar in the cloud
* Reproducible research and IPython notebooks

Madagascar Virtual Machine Released

November 26, 2014 Systems No comments

As an alternative to installing Madagascar, you can now run a Crunchbang (Debian) virtual machine (VM) with it pre-installed. Just download, unzip, and run the file with Oracle VirtualBox (free software). Detailed instructions for running the VM for the first time or installing VirtualBox can be found in the readme.
Downloads:
README.txt
MadagascarVM.zip (~3.0 GB)
MadagascarVM.7z (~2.1 GB, but requires 7zip to unpack)

Madagascar in the cloud

July 10, 2014 Systems No comments

SageMathCloud is a free cloud computing platform for computational mathematics created by William Stein, the leader of the Sage project.

SageMathCloud provides a rich environment, which allows one, for example, to easily install Madagascar and to access it interactively through its Python interface. The example above shows Madagascar running interactively in the cloud using an IPython notebook hosted by SageMathCloud. Support for interactive widgets is a new feature in IPython version 2 released earlier this year.

See also:

Continuous integration and reproducibility

November 11, 2013 Systems No comments

Continuous Integration (CI) is a technique in software engineering, usually described as one of the common techniques in extreme programming (XP). CI implies maintaining a shared code repository, where developers contribute frequently (possibly several times per day), and an automated build system that includes testing scripts.

Using a CI tool, such as TeamCity, it is easy to implement CI for Madagascar, which includes both compilation tests and reproducibility tests. One of the computers at the University of Texas at Austin has been dedicated to such testing and has helped to detect several bugs and reproducibility problems. You can subscribe to testing reports using the RSS feed. To implement similar testing on your own computer, install TeamCity and configure it as follows:

1. Configure Version Control Settings to connect to the Madagascar Subversion repository.

2. Configure Build Step to use a command-line script such as the following:

./configure --prefix=/tmp/rsfroot 
make install 
source env.sh 
export RSFFIGS=$RSFROOT/share/madagascar/figs 
cd book 
scons test

3. Go out for a cup of coffee and come back to check the results of testing. On a CI server, the run of the “compile & test” script is triggered every time somebody commits a new change to the repository.

4. Fix detected problems and commit your changes back to the repository to continue the integration loop.

Adopting the technique of Continuous Integration, in combination with reproducibility testing, provides a robust development environment with well-debugged code and continuously-maintained reproducible examples. It should encourage an active participation of the Madagascar development community.

Reproducible research and IPython notebooks

August 6, 2013 Systems No comments

Reproducible science was one of the main general themes at the recent SciPy conference in Austin. While different tools for accomplishing reproducible research are being proposed, IPython notebooks are often mentioned as one of the main tools. In his review of the meeting, Eric Jones of Enthought writes:

We can safely say that 2013 is the year of the IPython notebook. It was everywhere. I’’d guess 80+% of the talks and tutorials for the conference used it in their presentations.

As illustrated by the screenshots below, it is possible to combine SCons-based processing workflows with IPython notebooks.

The notebook used in this example is in rsf/rsf/test/test.ipynb