Null space versus starting solution

Next: About this document ... Up: GIANT PROBLEMS Previous: The starting solution matters!

Null space versus starting solution

The simplest null-space problem has one data point

emerging from two model points.

$\displaystyle d \quad \approx \quad \left[ \begin{array}{cc} a & b \end{array} \right] \left[ \begin{array}{c} x\ y \end{array} \right]$

(47)

The null space is any solution that produces no data. You can add an arbitrary amount $\beta$ of the null space getting another solution as good as the first. Here is the full solution.

$\displaystyle \left[ \begin{array}{r} x\ y \end{array} \right] \quad \approx \... ...array} \right] + \beta \left[ \begin{array}{r} -b\ a \end{array} \right]$

(48)

Iterative methods can neither subtract nor add any null space to your initial solution. It is obvious in this simple case, because the gradient (here the matrix adjoint) dotted into the null-space vector vanishes. Suppose and are matrices, while , , and are vectors. Although more complicated, something similar happens. You can test if an application involves a null space by comparing the results of various starting solutions.

Other traps arise in the world of images. Rarely are we able to iterate to full completion, so we might say, ``practically speaking, this application has null spaces.'' For example, if we know that zero frequency is theoretically a null space, we would say, ``The null space contains low frequencies.'' We cannot avoid such issues.

The textbook way of dealing with null spaces is to require the researcher to set up model styling goals (regularizations). Finding such goals demands assumptions from the researcher, assumptions that are often hard to specify. Luckily, there is another path to consider. Thinking more like a physicist, we could choose the initial solution more carefully.

In regression (47) extended to images, we might hope not to have a null-space problem when we begin iterating from $(\bold x,\bold y) =(\bold 0,\bold 0)$ , but this is not true. It is a pitfall, which in an application context, took me some years to recognize. Notice what happens the first step you move away from $(\bold x,\bold y) =(\bold 0,\bold 0)$ . Your solution becomes a constant $\beta$ times the gradient. The image extension of (47) being:

$\displaystyle \left[ \begin{array}{c} \bold x \ \bold y \end{array} \right] \q... ...ft[ \begin{array}{c} \bold A\T \bold d \ \bold B\T \bold d \end{array} \right]$

(49)

If the operators $\bold A$ and $\bold B$ resemble filters, it is pretty clear that $\bold x$ and $\bold y$ are correlated, which physically could be nonsense. We might be trying to discover if and how $\bold x$ and $\bold y$ are correlated. Or we might wish to demand they be uncorrelated.

I have no general method for you, but offer a suggestion that works for one family of applications and may be suggestive for others. Traditionally, it might happen that $\bold y$ is ignored, effectively taking $\bold y=\bold 0$ which happens when the data is better explained by $\bold A$ alone than by $\bold B$ alone. Solve first for $\bold x$ without $\bold y$ . Call it $\bold x_0$ . Now define a new variable $\tilde {\bold x}$ such that $\bold x = \bold x_0 +\tilde {\bold x}$ . Introducing your innovative concept (estimating $\bold y$ ) your regression becomes:

$\displaystyle \bold 0$	$\displaystyle \approx$	$\displaystyle \bold r \quad = \quad \bold A ( \bold x_0 +\tilde {\bold x}) + \bold B \bold y - \bold d$	(50)
$\displaystyle \bold 0$	$\displaystyle \approx$	$\displaystyle \bold r \quad = \quad \bold A \tilde {\bold x} + \bold B \bold y - (\bold d -\bold A\bold x_0)$	(51)

Start off from $(\tilde{\bold x},\bold y)=(\bold 0,\bold 0)$ . Like Equation (49), the first step leads to:

$\displaystyle \left[ \begin{array}{c} \bold {\tilde x} \ \bold y \end{array} \... ...ft[ \begin{array}{c} \bold A\T \bold r \ \bold B\T \bold r \end{array} \right]$

(52)

which is very different from Equation (49), because $\bold r$ is very different from $\bold d$ . Although we may still have an annoying or inappropriate correlation between $\tilde {\bold x}$ and $\bold y$ , it is a lot less annoying than a correlation between $\bold x$ and $\bold y$ .

Next: About this document ... Up: GIANT PROBLEMS Previous: The starting solution matters!

2015-05-07