Seismic wave extrapolation using lowrank symbol approximation

Next: Bibliography Up: Fomel, Ying, & Song: Previous: Acknowledgments

# Linear-time algorithm for a lowrank matrix approximation

In this appendix, we outline the lowrank matrix approximation algorithm in more details.

Let be the number of samples both in space and wavenumber. Let us denote the samples in the spatial domain by and the ones in the Fourier domain by . The elements of the interaction matrix from equation (11) are then defined as

 (17)

Here we describe an algorithm by Engquist and Ying (2009) that generates, in a time linear with respect to , an approximate factorization of of rank in the following form

 (18)

where consists of selected columns from , is a matrix of size and consists of selected rows from .

The first question is: which columns of shall one pick for the matrix ? It has been shown by Goreinov et al. (1997) and Gu and Eisenstat (1996) that the -dimensional volume spanned by these columns should be the maximum or close to the maximum among all possible choices of columns from . More precisely, suppose is a column partitioning of . Then one aims to find such that

 argmin (19)

However, finding a set of columns with almost the maximum -dimensional volume is a computationally difficult problem due to the following two reasons. First, the length of the vectors is typically very large for three dimensional problems, hence manipulating these vectors can be costly. Second, the number of the vectors is also large. A exhaustive search over all possible choices of vectors to find the one with the maximum volume is prohibitive expensive, so one needs to find a more practical approach.

In order to overcome the problem associated with long vectors, the first idea is to project to a lower dimensional space and search for the set of vectors with maximum volume among the projected vectors. However, one needs to ensure that the volume is roughly preserved after the projection so that the set of vectors with the maximum projected volume also has a near-maximum volume in the original space. One of the most celebrated theorems in high dimensional geometry and probability is the following Johnson-Lindenstrauss lemma (Johnson and Lindenstrauss, 1984).

Theorem 1   Let be a set of vectors in . Let be a randomly generated subspace of dimension and use to denote the orthogonal projection onto . Then with high probability,

for .

This theorem essentially says that projecting to a subspace of dimension preserves the pairwise distance between arbitrary vectors. There is an immediate generalization of this theorem due to Magen (2002), formulated slightly differently for our purpose.

Theorem 2   Let be a set of vectors in . Let be a randomly generated subspace of dimension and use to denote the orthogonal projection onto . Then with high probability,

for any .

The main step of the proof is to bound the singular values of a random matrix between and (after a uniform scaling) and this ensures that the -dimensional volume is preserved within a factor of and . In order to obtain this bound on the singular values, we need to be . However, bounding the singular values is only one way to bound the volume, hence it is possible to improve the dependence of on . In fact, in practice, we observe that only needs to scale like .

Given a generic subspace of dimension , computing the projections takes steps. Recall that our goal is to find an algorithm with linear complexity, hence this is still too costly. In order to reduce the cost of the random projection, the second idea of our approach is to randomly choose coordinates and then project (or restrict) each vector only to these coordinates. This is a projection with much less randomness but one that is much more efficient to apply. Computationally, this is equivalent to restricting to randomly selected rows. We do not yet have a theorem regarding the volume for this projection. However, it preserves the -dimensional volume very well for the matrix and this is in fact due to the oscillatory nature of the columns of . We denote the resulting vectors by .

The next task is to find a set of columns so that the volume is nearly maximum. As we mentioned earlier, exhaustive search is too costly. To overcome this, the third idea is to use the following pivoted QR algorithm (or pivoted Gram-Schmidt process) to find the columns.

Once the column set is found, we set .

In order to identify , one needs to find a set of rows of that has an almost maximum volume. To do that, we repeat the same steps now to . More precisely, let

 (20)

be the row partitioning of the matrix . The algorithm takes the following steps:

Once both and are identified, the last task is to compute the matrix for . Minimizing

 (21)

yeilds where stands for the pseudo-inverse. However, this formula requires taking matrix product with , which takes steps. In order to achieve linear scaling, the fourth idea of our approach is to select randomly a set of rows and a set of columns and minimize

 (22)

The solution for this problem is

 (23)

Let us now discuss the overall cost of this algorithm. Random sampling of rows and columns of the matrix clearly takes steps. Pivoted QR factorization on the projected columns takes steps and the cost for for the pivoted QR factorization on the projected rows. Finally, performing pseudo-inverses takes steps. Therefore, the overall cost of the algorithm is . As we mentioned earlier, in practice . Hence, the overall cost is linear in .

 Seismic wave extrapolation using lowrank symbol approximation

Next: Bibliography Up: Fomel, Ying, & Song: Previous: Acknowledgments

2013-08-31