PML: avoid `stack frames` in GPU kernel #3881

psychocoderHPC · 2021-10-20T14:16:13Z

fix: one part of #3870

dev branch:

ptxas info    : Compiling entry function '_ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_' for 'sm_70'
ptxas info    : Function properties for _ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_
    336 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 42 registers, 4864 bytes smem, 800 bytes cmem[0]

this PR:

ptxas info    : Compiling entry function '_ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_' for 'sm_70'
ptxas info    : Function properties for _ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

By using an break within a for loop we triggered using stack frame usage
in the GPU kernel.

include/picongpu/fields/absorber/pml/Field.tpp

fix: one part of ComputationalRadiationPhysics#3870 By using an break within a for loop we triggered using stack frame usage in the GPU kernel.

psychocoderHPC added the refactoring code change to improve performance or to unify a concept but does not change public API label Oct 20, 2021

psychocoderHPC added this to the 0.7.0 / 1.0.0: Next Stable milestone Oct 20, 2021

sbastrakov suggested changes Oct 20, 2021

View reviewed changes

include/picongpu/fields/absorber/pml/Field.tpp Outdated Show resolved Hide resolved

psychocoderHPC force-pushed the fix-pmlStackFrameUsage branch from f4f9705 to 43e1de2 Compare October 20, 2021 14:47

PML: avoid stack frames in GPU kernel

cc91c07

fix: one part of ComputationalRadiationPhysics#3870 By using an break within a for loop we triggered using stack frame usage in the GPU kernel.

psychocoderHPC force-pushed the fix-pmlStackFrameUsage branch from 43e1de2 to cc91c07 Compare October 20, 2021 14:50

sbastrakov approved these changes Oct 20, 2021

View reviewed changes

sbastrakov merged commit f76e403 into ComputationalRadiationPhysics:dev Oct 21, 2021

psychocoderHPC deleted the fix-pmlStackFrameUsage branch November 18, 2021 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PML: avoid `stack frames` in GPU kernel #3881

PML: avoid `stack frames` in GPU kernel #3881

psychocoderHPC commented Oct 20, 2021

PML: avoid stack frames in GPU kernel #3881

PML: avoid stack frames in GPU kernel #3881

Conversation

psychocoderHPC commented Oct 20, 2021

PML: avoid `stack frames` in GPU kernel #3881

PML: avoid `stack frames` in GPU kernel #3881