next up previous [pdf]

Next: Fréchet derivative Up: Full waveform inversion (FWI) Previous: The Newton, Gauss-Newton, and

Conjugate gradient (CG) implementation

The gradient-like method can be summarized as

$\displaystyle \textbf{m}_{k+1}=\textbf{m}_k+\alpha_k \textbf{d}_k.$ (80)

The conjugate gradient (CG) algorithm decreases the misfit function along the conjugate gradient direction:

$\displaystyle \textbf{d}_k= \begin{cases}-\nabla E(\textbf{m}_0), & k=0\\ -\nabla E(\textbf{m}_k)+\beta_k \textbf{d}_{k-1}, & k\geq 1 \end{cases}$ (81)

There are many ways to compute $ \beta_k$ :

\begin{equation*}\left\{ \begin{split}\beta_k^{HS}&=\frac{\langle\nabla E(\textb...
...xtbf{m}_k)-\nabla E(\textbf{m}_{k-1})\rangle} \end{split} \right.\end{equation*} (82)

To achieve best convergence rate, in practice we suggest to use a hybrid scheme combing Hestenes-Stiefel and Dai-Yuan:

$\displaystyle \beta_k=\max(0, \min(\beta_k^{HS},\beta_k^{DY})).$ (83)

Iterating with Eq. (80) needs to find an appropriate $ \alpha_k$ . Here we provide two approaches to calculate $ \alpha_k$ .

$ \bullet$ Approach 1: Currently, the objective function is

$\displaystyle E(\textbf{m}_{k+1})=E(\textbf{m}_k+\alpha_k \textbf{d}_{k})=E(\te...
...}_k\rangle+\frac{1}{2}\alpha_k^2\textbf{d}_k^{\dagger}\textbf{H}_k\textbf{d}_k.$ (84)

Setting $ \frac{\partial E(\textbf{m}_{k+1})}{\partial \alpha_k}=0$ gives

$\displaystyle \alpha_k=-\frac{\langle\textbf{d}_k,\nabla E(\textbf{m}_k)\rangle...
...m}_k)\rangle}{\langle\textbf{J}_k\textbf{d}_k,\textbf{J}_k\textbf{d}_k\rangle}.$ (85)

$ \bullet$ Approach 2: Recall that

$\displaystyle \textbf{f}(\textbf{m}_k+\alpha_k \textbf{d}_{k}) =\textbf{f}(\tex...
...}_k)+\alpha_k \textbf{J}_k\textbf{d}_{k}+O(\vert\vert\textbf{d}_k\vert\vert^2).$ (86)

Using the 1st-order approximation, we have

\begin{displaymath}\begin{split}E(\textbf{m}_{k+1})&=\frac{1}{2} \vert\vert\text...
...f{J}_k\textbf{d}_k,\textbf{J}_k\textbf{d}_k\rangle. \end{split}\end{displaymath} (87)

Setting $ \frac{\partial E(\textbf{m}_{k+1})}{\partial \alpha_k}=0$ gives

$\displaystyle \alpha_k=\frac{\langle\textbf{J}_k\textbf{d}_k,\textbf{p}_{obs}-\...
...}_k)\rangle} {\langle\textbf{J}_k\textbf{d}_k,\textbf{J}_k\textbf{d}_k\rangle}.$ (88)

In fact, Eq. (88) can also be obtained from Eq. (85) in terms of Eq. (72): $ \nabla E_{\textbf{m}}=\textbf{J}^{\dagger}\Delta \textbf{p}$ .

In terms of Eq. (86), the term $ \textbf{J}_k\textbf{d}_k$ is computed conventionally using a 1st-order-accurate finite difference approximation of the partial derivative of $ \textbf{f}$ :

$\displaystyle \textbf{J}_k\textbf{d}_k=\frac{\textbf{f}(\textbf{m}_k+\epsilon \textbf{d}_k)-\textbf{f}(\textbf{m}_k)}{\epsilon}$ (89)

with a small parameter $ \epsilon$ . In practice, we chose an $ \epsilon$ such that

$\displaystyle \max(\epsilon \vert\textbf{d}_k\vert)\leqslant \frac{\max(\vert\textbf{m}_k\vert)}{100}.$ (90)


next up previous [pdf]

Next: Fréchet derivative Up: Full waveform inversion (FWI) Previous: The Newton, Gauss-Newton, and

2021-08-31