next up previous [pdf]

Next: PRECONDITIONED DATA FITTING Up: Reproducible Documents

Preconditioning

Jon Claerbout


In Chapter [*], we developed adjoints and in Chapter [*], we developed inverse operators. Logically, correct solutions come only through inversion. Real life, however, seems nearly the opposite. This situation is puzzling but intriguing. It seems an easy path to fame and profit would be to go beyond adjoints by introducing some steps of inversion. It is not that easy. Images contain so many unknowns. Mostly, we cannot iterate to completion and need concern ourselves with the rate of convergence. Often, necessity limits us to a handful of iterations whereby in principle, millions or billions are required.

When you fill your car with gasoline, it derives more from an adjoint than an inverse. Industrial seismic data processing relates more to adjoints than to inverses though there is a place for both, of course. It cannot be much different with medical imaging.

First consider cost. For simplicity, consider a data space with $ N$ values and a model (or image) space of the same size. The computational cost of applying a dense adjoint operator increases in direct proportion to the number of elements in the matrix, in this case $ N^2$ . To achieve the minimum discrepancy between modeled data and observed data (inversion) theoretically requires $ N$ iterations raising the cost to $ N^3$ .

Consider an image of size $ m\times m=N$ . Continuing, for simplicity, to assume a dense matrix of relations between model and data, the cost for the adjoint is $ m^4$ , whereas, the cost for inversion is $ m^6$ . We consider computational costs for the year 2000, but noticing that costs go as the sixth power of the mesh size, the overall situation will not change much in the foreseeable future. Suppose you give a stiff workout to a powerful machine; you take an hour to invert a 4,096$ \times$ 4,096 matrix. The solution, a vector of $ 4096$ components could be laid into an image of size 64 $ \times$ 64= $ 2^6\times 2^6$ = 4,096. Here is what we are looking at for costs:

adjoint cost $ (m\times m )^2$ $ (512\times 512)^2$ $ (2^9 2^9)^2$ $ 2^{36}$
inverse cost $ (m\times m )^3$ $ (64\times 64)^3$ $ (2^6 2^6)^3$ $ 2^{36}$

These numbers tell us that for applications with dense operators, the biggest images that we are likely to see coming from inversion methods are $ 64\times 64$ , whereas, those from adjoint methods are $ 512\times 512$ . For comparison, your vision is comparable to your computer screen at 1,000 $ \times$ 1,000.

512x512
Figure 1.
Jos greets Andrew, ``Welcome back Andrew'' from the Peace Corps. At a resolution of $ 512\times 512$ , this picture is approximately the same as the resolution as the paper on which it is printed, or the same as your viewing screen, if you have scaled it up to 50% of screen size.
512x512
[pdf] [png]

http://sepwww.stanford.edu/sep/jon/family/jos/gifmovie.html holds a movie blinking between Figures 1 and 2.

This cost analysis is oversimplified in that most applications do not require dense operators. With sparse operators, the cost advantage of adjoints is even more pronounced because for adjoints, the cost savings of operator sparseness translate directly to real cost savings. The situation is less favorable and more muddy for inversion. The reason that Chapter 2 covers iterative methods and neglects exact methods is that in practice iterative methods are not run to theoretical completion, but until we run out of patience. But that leaves hanging the question of what percent of theoretically dictated work is actually necessary. If we struggle to accomplish merely one percent of the theoretically required work, can we hope to achieve anything of value?

Cost is a big part of the story, but the story has many other parts. Inversion, while being the only logical path to the best answer, is a path littered with pitfalls. The first pitfall is that the data is rarely able to determine a complete solution reliably. Generally, there are aspects of the image that are not learnable from the data.

64x64
Figure 2.
Jos greets Andrew, ``Welcome back Andrew'' again. At a resolution of $ 64\times 64$ , the pixels are clearly visible. From far the pictures are the same. From near, examine their glasses.
64x64
[pdf] [png]

When I first realized that practical imaging methods with wide industrial use amounted merely to the adjoint of forward modeling, I (and others) thought an easy way to achieve fame and fortune would be to introduce the first steps toward inversion along the lines of Chapter [*]. Although inversion generally requires a prohibitive number of steps, I felt that moving in the gradient direction, the direction of steepest descent, would move us rapidly in the direction of practical improvements. This optimism was soon exposed. Improvements came too slowly. But then, I learned about the conjugate gradient method that spectacularly overcomes a well-known speed problem with the method of steepest descent. I came to realize it was still too slow. I learned by watching the convergence in Figure 8, which led me to the helix method in Chapter 4. Here we see how it speeds many applications.

We also come to understand why the gradient is such a poor direction both for steepest descent and conjugate gradients. An indication of our path is found in the contrast between an exact solution and the gradient.

$\displaystyle \bold m$ $\displaystyle =$ $\displaystyle (\bold A\T\bold A)^{-1}\bold A\T\bold d$ (1)
$\displaystyle \Delta \bold m$ $\displaystyle =$ $\displaystyle \quad\quad\quad \bold A\T\bold d$ (2)

Equations (1) and (2) differ by the factor $ (\bold A\T\bold A)^{-1}$ . This factor is sometimes called a spectrum, and in some situations, it literally is a frequency spectrum. Our updates do not have the spectrum of the thing we are trying to build. No wonder it's slow! Here we find for many applications that ``preconditioning'' with the helix is a better way.




next up previous [pdf]

Next: PRECONDITIONED DATA FITTING Up: Reproducible Documents

2015-05-07