next up previous [pdf]

Next: Clique Up: Parallel sweeping preconditioner Previous: Global vector distributions

Parallel preconditioned GMRES(k)

Since, by hypothesis, only $ O(1)$ iterations of GMRES($ k$ ) will be required for convergence with the sweeping preconditioner, a cursory inspection of Algorithm 0.0.5 reveal that most of the work in a preconditioned iterative method, such as GMRES($ k$ ), will be performed in the multifrontal solves during the preconditioner application, but a modest portion will also be spent in sparse matrix-vector multiplication with the discrete Helmholtz operator, $ A$ , and the off-diagonal blocks of the discrete artificially damped Helmholtz operator, $ J$ . It is thus important to parallelize the sparse matrix-vector multiplies, but it is not crucial that the scheme be optimal, and so we simply distribute $ A$ and $ J$ in the same manner as vectors, i.e., with the $ [\mathcal{G},\star]$ distribution derived from the auxiliary problems' frontal distributions.

For performance reasons, it is beneficial to solve as many right-hand sides simultaneously as possible: both the communication latency and the costs of loading the local data from frontal and sparse matrices from main memory can be amortized over all of the right-hand sides. Another idea is to extend the so-called trsm algorithm for triangular solves with many right-hand sides (i.e., more right-hand sides than processes), which is well-known in the field of dense linear algebra (30), into the realm of sparse-direct solvers via the dense frontal triangular solves. This approach was not pursued in this paper due to the modest storage space available on Lonestar and is left for future work. Another performance improvement might come from exploiting block variants of GMRES (36), which can potentially lower the number of required iterations.


next up previous [pdf]

Next: Clique Up: Parallel sweeping preconditioner Previous: Global vector distributions

2014-08-20