-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Hi @oliviermattelaer and @roiser,
in debugging #885 (in WIP PR #882) I realised that the code I generate out of the box has WARP_SIZE equal to VECSIZE_MEMMAX, with NB_WARP=1 hardcoded. In particular WARP_SIZE and VECSIZE_MEMMAX seem to be both controlled by vector_size in the runcards.
I am very surprised by this. I thought that on a GPU one would for instance use VECSIZE_MEMMAX=16384, while keeping WARP_SIZE=32. Actually, I thought that WARP_SIZE=32 would need to be hardcoded (this is the typical spec on an Nvidia GPU).
As for NB_WARP, I do not understand what this means.
Can you please explain which values WARP_SIZE and NB_WRAP should have, and how this functionality can be tested?
(On top of this, note that the actually used VECSIZE_USED can be lower than VECSIZE_MEMMAX. The crash in #885 comes from the fact that this does not seem to be handled corrrectly now).
Thanks
Andrea
PS This is related and in large overlap to #765. But it is a question specifically about what exists now in master_june24. I would like to understand how this is supposed to work and be tested.