Selective inversion

Next: Global vector distributions Up: Parallel sweeping preconditioner Previous: Parallel multifrontal algorithms

Selective inversion

The lackluster scalability of dense triangular solves is well known and a scheme known as selective inversion was introduced in (32) and implemented in (31) specifically to avoid the issue; the approach is characterized by directly inverting every distributed dense triangular matrix which would have been solved against in a normal multifrontal triangular solve. Thus, after performing selective inversion, every parallel dense triangular solve can be translated into a parallel dense triangular matrix-vector multiply.

Suppose that we have paused a multifrontal factorization just before processing a particular front, , which corresponds to some supernode, $\mathcal{S}$ . Then all of the fronts for the descendants of $\mathcal{S}$ have already been handled, and can be partitioned as

$\displaystyle F = \left(\begin{array}{cc} F_{TL} & \star \\ F_{BL} & F_{BR}\end{array}\right),$

(6)

where $F_{TL}$ holds the Schur complement of supernode $\mathcal{S}$ with respect to all of its descendants, $F_{BL}$ represents the coupling of $\mathcal{S}$ and its descendants to $\mathcal{S}$ 's ancestors, and $F_{BR}$ holds the Schur complement updates from the descendants of $\mathcal{S}$ for the ancestors of $\mathcal{S}$ . Using hats to denote input states, e.g., $\hat F_{TL}$ to denote the input state of $F_{TL}$ , the first step in processing the frontal matrix

is to overwrite $F_{TL}$ with its

factorization, which is to say that $\hat F_{TL}$ is overwritten with the strictly lower portion of a unit lower triangular matrix

and a diagonal matrix

such that $\hat F_{TL} = L_F D_F L_F^T$ .

The partial factorization of can then be completed via the following steps:

Solve against to form $F_{BL} := F_{BL} L_F^{-T}$ .
Form the temporary copy $Z_{BL} := F_{BL}$ .
Finalize the coupling matrix as $F_{BL} := F_{BL} D_F^{-1}$ .
Finalize the update matrix as $F_{BR} := F_{BR} - \hat F_{BL} \hat F_{TL}^{-1} \hat F_{BL}^T = F_{BR} - Z_{BL} F_{BL}^T$ .

After adding $F_{BR}$ onto the parent frontal matrix, only $F_{TL}$ and $F_{BL}$ are needed in order to perform a multifrontal solve. For instance, applying $L^{-1}$ to some vector

proceeds up the elimination tree (starting from the leaves) in a manner similar to the factorization; after handling all of the work for the descendants of some supernode $\mathcal{S}$ , only a few dense linear algebra operations with $\mathcal{S}$ 's corresponding frontal matrix, say

, are required. Denoting the portion of

corresponding to the degrees of freedom in supernode $\mathcal{S}$ by $x_{\mathcal{S}}$ , we must perform:

$x_{\mathcal{S}} := L_F^{-1} x_{\mathcal{S}}$
$x_U \equiv -F_{BL} x_{\mathcal{S}}$
Add onto the entries of corresponding to the parent supernode.

The key insight of selective inversion is that, if we invert each distributed dense unit lower triangular matrix in place, all of the parallel dense triangular solves in a multifrontal triangular solve are replaced by parallel dense matrix-vector multiplies. It is also observed in (32) that the work required for the selective inversion is typically only a modest percentage of the work required for the multifrontal factorization, and that the overhead of the selective inversion is easily recouped if there are several right-hand sides to solve against.

Since each application of the sweeping preconditioner requires two multifrontal solves for each of the $m=O(n/\gamma)$ subdomains, which are relatively small and likely distributed over a large number of processes, selective inversion will be shown to yield a very large performance improvement. We also note that, while it is widely believed that direct inversion is numerically unstable, in (11) Druinsky and Toledo provide a review of (apparently obscure) results dating back to Wilkinson (in (42)) which show that $x := \mathrm{inv}(A)\!*\!b$ is as accurate as a backwards stable solve if reasonable assumptions are met on the accuracy of $\mathrm{inv}(A)$ . Since $\mathrm{inv}(A)\!*\!b$ is argued to be more accurate when the columns of $\mathrm{inv}(A)$ have been computed with a backwards-stable solver, and both $\mathrm{inv}(F_{TL})$ and $\mathrm{inv}(F_{TL}^T)$ must be applied after selective inversion, it might be worthwhile to modify selective inversion to compute and store two different inverses of each $F_{TL}$ : one by columns and one by rows.

Next: Global vector distributions Up: Parallel sweeping preconditioner Previous: Parallel multifrontal algorithms

2014-08-20