RTM using effective boundary saving: A staggered grid GPU implementation |
Allowing for the GPU block alignment, the thickness of CPML boundary is chosen to be 32. Most of the CUDA kernels are configured with a block size 16x16. Some special configurations are related to the initialization and calculation of CPML boundary area. The CPML variables are initialized along x and z axis with CUDA kernels cuda_init_abcz( ) and cuda_init_abcx( ). When device_alloc( ) is invoked to allocate memory, there is a variable phost to control the percentage of the effective boundary saved on host and device memory by calling the function cudaHostAlloc( ). A pointer is referred to the pinned memory via cudaHostGetDevicePointer( ). The wavelet is generated on device using cuda_ricker_wavelet( ) with a dominant frequency fm and delayed wavelength. Adding a shot can be done by a smooth bell transition cuda_add_bellwlt( ). We implement RTM (of order NJ=2, 4, 6, 8, 10) with forward and backward propagation functions step_forward( ) and step_backward( ), in which the shared memory is also used for faster computation. The cross-correlation imaging of each shot is done by cuda_cross_correlate( ). The final image can be obtained by stacking the images of many shots using cuda_imaging( ). Most of the low-frequency noise can be removed by applying the muting function cuda_mute( ) and the Laplacian filtering cuda_laplace_filter( ).
RTM using effective boundary saving: A staggered grid GPU implementation |