-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to disable automatic barriers #484
Comments
To clarify the intended usage: are you looking for a way to disable all automatic barriers for a kernel—e.g., by passing kernel properties, or specific automatic barriers within a kernel—e.g., using an attribute like |
Currently a |
Correct, I was curious which type of mechanism would be most useful in your case for stopping this. Decorating the outermost Passing a flag through the kernel properties would be convenient if the programmer wanted to disable barriers for a kernel with several inner blocks, however it wouldn't be obvious to anyone reading the untranslated kernel source that this was happening. |
A small clarification is that a barrier is added automatically after an inner block only if that block used shmem at all, and it's not the last inner block. We did used to have the option to disable auto barrier addition, but I agree with Kris that this probably isn't ideal since it's a pretty heavy toggle to set in entire okl file(s). These days I just fuse inner blocks when I dont want the barrier. |
A related issue: If the inner size <= warpSize a warp-wide barrier should be added. Currently no |
Can you give an example of this? Do you have individual lanes of the warp trying to communicate through global memory? I haven't seen any use for __syncwarp aside from that scenario. The normal __syncthreads is identical to __syncwarp when inner size <= warp size. |
My bigger concern is that at the moment no barriers are added at all. Maybe I recall incorrectly? |
There's no syncwarps added currently, that's correct. But a barrier like that should only be added when such a barrier is needed. I'm curious where specifically you think the barrier is needed. Right now you could obviously just rely on splitting inner blocks and getting coherency through the usual syncthreads. Is syncwarp faster than syncthreads? Depends on the usage. They're likely comparable in time if you have to wait on global mem fences. If the threadblock is truly made of warps that dont share data with one another (so syncthreads isnt needed), but do share data between the lanes of the warp, then yes there's probably opportunity to progress some warps while barriering others. I dont think that's common, however. Is that what you need to happen? |
This is also relevant for OpenCL and SYCL/DPC++ since the innermost |
No description provided.
The text was updated successfully, but these errors were encountered: