Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PML: avoid stack frames in GPU kernel #3881

Conversation

psychocoderHPC
Copy link
Member

fix: one part of #3870

dev branch:

ptxas info    : Compiling entry function '_ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_' for 'sm_70'
ptxas info    : Function properties for _ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_
    336 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 42 registers, 4864 bytes smem, 800 bytes cmem[0]

this PR:

ptxas info    : Compiling entry function '_ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_' for 'sm_70'
ptxas info    : Function properties for _ZN6alpaka16uniform_cuda_hip6detail20uniformCudaHipKernelINS_12AccGpuCudaRtISt17integral_constantImLm3EEjEES5_jN5cupla16cupla_cuda_async11CuplaKernelIN8picongpu6fields13maxwellSolver4fdtd17KernelUpdateFieldILj256EEEEEJN5pmacc11AreaMappingILj2ENSH_18MappingDescriptionILj3ENSH_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESP_NSO_IiLi4EEEEEEEEENSB_8absorber3pml18UpdateBHalfFunctorINSB_15differentiation4CurlINSX_7ForwardEEEEENSH_7DataBoxINSH_10PitchedBoxINSK_6VectorIfLi3ENSK_16StandardAccessorENSK_17StandardNavigatorENSK_6detail17Vector_componentsIfLi3EEEEELj3EEEEES1C_EEEvNS_3VecIT0_T1_EET2_DpT3_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

By using an break within a for loop we triggered using stack frame usage
in the GPU kernel.

@psychocoderHPC psychocoderHPC added the refactoring code change to improve performance or to unify a concept but does not change public API label Oct 20, 2021
fix: one part of ComputationalRadiationPhysics#3870

By using an break within a for loop we triggered using stack frame usage
in the GPU kernel.
@sbastrakov sbastrakov merged commit f76e403 into ComputationalRadiationPhysics:dev Oct 21, 2021
@psychocoderHPC psychocoderHPC deleted the fix-pmlStackFrameUsage branch November 18, 2021 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring code change to improve performance or to unify a concept but does not change public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants