![]() |
![]() |
![]() |
![]() | RTM using effective boundary saving: A staggered grid GPU implementation | ![]() |
![]() |
Allowing for the GPU block alignment, the thickness of CPML boundary is chosen to be 32. Most of the CUDA kernels are configured with a block size 16x16. Some special configurations are related to the initialization and calculation of CPML boundary area. The CPML variables are initialized along x and z axis with CUDA kernels cuda_init_abcz(
) and cuda_init_abcx(
). When device_alloc(
) is invoked to allocate memory, there is a variable phost to control the percentage of the effective boundary saved on host and device memory by calling the function cudaHostAlloc(
). A pointer is referred to the pinned memory via cudaHostGetDevicePointer(
). The wavelet is generated on device using cuda_ricker_wavelet(
) with a dominant frequency fm and delayed wavelength. Adding a shot can be done by a smooth bell transition cuda_add_bellwlt(
). We implement RTM (of order NJ=2, 4, 6, 8, 10) with forward and backward propagation functions step_forward(
) and step_backward(
), in which the shared memory is also used for faster computation. The cross-correlation imaging of each shot is done by cuda_cross_correlate(
). The final image can be obtained by stacking the images of many shots using cuda_imaging(
). Most of the low-frequency noise can be removed by applying the muting function cuda_mute(
) and the Laplacian filtering cuda_laplace_filter(
).
![]() |
![]() |
![]() |
![]() | RTM using effective boundary saving: A staggered grid GPU implementation | ![]() |
![]() |