next up previous [pdf]

Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Seismic field data examples

PEF ESTIMATION WITH MISSING DATA

If we are not careful, our calculation of the PEF could have the pitfall that it would try to use the missing data to find the PEF, and hence it would get the wrong PEF. To avoid this pitfall, imagine a PEF finder that uses weighted least squares where the weighting function vanishes on those fitting equations that involve missing data. The weighting would be unity elsewhere. Instead of weighting bad results by zero, we simply will not compute them. The residual there will be initialized to zero and never changed. Likewise for the adjoint, these components of the residual will never contribute to a gradient. So now we need a convolution program that produces no outputs where missing inputs would spoil it.

Recall there are two ways of writing convolution, equation ([*]) when we are interested in finding the filter inputs, and equation ([*]) when we are interested in finding the filter itself. We have already coded equation ([*]), operator helicon [*]. That operator was useful in missing data applications. Now we want to find a prediction-error filter so we need the other case, equation ([*]), and we need to ignore the outputs that will be broken because of missing inputs. The operator module hconest does the job.

user/gee/hconest.c
    for (ia = 0; ia < na; ia++) {
	for (iy = aa->lag[ia]; iy < ny; iy++) {  
	    if(aa->mis[iy]) continue;
  
            ix = iy - aa->lag[ia];

	    if( adj) a[ia] -=  y[iy] * x[ix];
	    else     y[iy] -=  a[ia] * x[ix];
	}
    }

We are seeking a prediction error filter $ (1,a_1,a_2)$ but some of the data is missing. The data is denoted $ \bold y$ or $ y_i$ above and $ x_i$ below. Because some of the $ x_i$ are missing, some of the regression equations in (29) are worthless. When we figure out which are broken, we will put zero weights on those equations.

$\displaystyle \bold 0 \approx \bold r = \bold W \bold X \bold a = \left[ \begi...
...\end{array} \right] \left[ \begin{array}{c} 1 \ a_1 \ a_2 \end{array} \right]$ (29)

Suppose that $ x_2$ and $ x_3$ were missing or known bad. That would spoil the 2nd, 3rd, 4th, and 5th fitting equations in (29). In principle, we want $ w_2$ , $ w_3$ , $ w_4$ and $ w_5$ to be zero. In practice, we simply want those components of $ \bold r$ to be zero.

What algorithm will enable us to identify the regression equations that have become defective, now that $ x_2$ and $ x_3$ are missing? Take filter coefficients $ (a_0, a_1, a_2,\ldots)$ to be all ones. Let $ \bold d_{\rm free}$ be a vector like $ \bold x$ but containing 1's for the missing (or ``freely adjustable'') data values and 0's for the known data values. Recall our very first definition of filtering showed we can put the filter in a vector and the data in a matrix or vice versa. Thus $ \bold X \bold a$ above gives the same result as $ \bold A \bold x$ below.

$\displaystyle \left[ \begin{array}{c} m_1 \ m_2 \ m_3 \ m_4 \ m_5 \ m_6 \\...
...} 0 \ 1 \ 1 \ 0 \ 0 \ 0 \end{array} \right] \eq \bold A \bold d_{\rm free}$ (30)

The numeric value of each $ m_i$ tells us how many of its inputs are missing. Where none are missing, we want unit weights $ w_i=1$ . Where any are missing, we want zero weights $ w_i=0$ . The desired residual under partially missing inputs is computed by module misinput [*].

user/gee/misinput.c
void find_mask(int n            /* data size */, 
	       const int *known /* mask for known data [n] */, 
	       sf_filter aa     /* helical filter */) 
/*< create a filter mask >*/
{
    int i, ih;
    float *rr, *dfre;

    rr = sf_floatalloc(n);
    dfre = sf_floatalloc(n);

    for (i=0; i < n; i++) {
	dfre[i] = known[i]? 0.:1.;
    }
    
    sf_helicon_init(aa);

    for (ih=0; ih < aa->nh; ih++) {
	aa->flt[ih] = 1.;
    }

    sf_helicon_lop(false,false,n,n,dfre,rr);

    for (ih=0; ih < aa->nh; ih++) {
	aa->flt[ih] = 0.;
    }

    for (i=0; i < n; i++) {
	if ( rr[i] > 0.) aa->mis[i] = true;	
    }

    free (rr);
    free (dfre);
}



Subsections
next up previous [pdf]

Next: Internal boundaries to multidimensional Up: Multidimensional autoregression Previous: Seismic field data examples

2013-07-26