Add option to disable automatic barriers #484

stgeke · 2021-02-24T07:11:06Z

No description provided.

kris-rowe · 2021-09-13T17:34:29Z

To clarify the intended usage: are you looking for a way to disable all automatic barriers for a kernel—e.g., by passing kernel properties, or specific automatic barriers within a kernel—e.g., using an attribute like @nobarrier similar to the nowait directive in OpenMP?

stgeke · 2021-09-14T08:16:02Z

Currently a @barrier is added after an inner block. It would be nice to have an option to disable this.

kris-rowe · 2021-09-14T14:57:31Z

Correct, I was curious which type of mechanism would be most useful in your case for stopping this.

Decorating the outermost @inner loop with an attribute like @nobarrier (e.g., either at the loop definition, or immediately after the loop) would provide the most fine-grain control, since other inner blocks within the same kernel would still have barriers inserted. Additionally, it would satisfy the principle-of-least-surprise for anyone else reading the kernel.

Passing a flag through the kernel properties would be convenient if the programmer wanted to disable barriers for a kernel with several inner blocks, however it wouldn't be obvious to anyone reading the untranslated kernel source that this was happening.

noelchalmers · 2021-09-14T15:08:39Z

A small clarification is that a barrier is added automatically after an inner block only if that block used shmem at all, and it's not the last inner block.

We did used to have the option to disable auto barrier addition, but I agree with Kris that this probably isn't ideal since it's a pretty heavy toggle to set in entire okl file(s). These days I just fuse inner blocks when I dont want the barrier.

stgeke · 2021-09-14T15:15:38Z

A related issue: If the inner size <= warpSize a warp-wide barrier should be added. Currently no @barrier is added at all. That's tricky at least for Nvidia's Volta and later architectures (you can no longer assume that the threads in a wrap run in lock-step).

noelchalmers · 2021-09-14T15:27:37Z

Can you give an example of this? Do you have individual lanes of the warp trying to communicate through global memory? I haven't seen any use for __syncwarp aside from that scenario.

The normal __syncthreads is identical to __syncwarp when inner size <= warp size.

stgeke · 2021-09-14T15:31:05Z

My bigger concern is that at the moment no barriers are added at all. Maybe I recall incorrectly?
Isn't __syncwarp faster than __syncthreads?

noelchalmers · 2021-09-14T15:42:57Z

There's no syncwarps added currently, that's correct. But a barrier like that should only be added when such a barrier is needed. I'm curious where specifically you think the barrier is needed. Right now you could obviously just rely on splitting inner blocks and getting coherency through the usual syncthreads.

Is syncwarp faster than syncthreads? Depends on the usage. They're likely comparable in time if you have to wait on global mem fences. If the threadblock is truly made of warps that dont share data with one another (so syncthreads isnt needed), but do share data between the lanes of the warp, then yes there's probably opportunity to progress some warps while barriering others. I dont think that's common, however. Is that what you need to happen?

kris-rowe · 2021-09-14T16:46:19Z

A related issue: If the inner size <= warpSize a warp-wide barrier should be added. Currently no @barrier is added at all. That's tricky at least for Nvidia's Volta and later architectures (you can no longer assume that the threads in a wrap run in lock-step).

This is also relevant for OpenCL and SYCL/DPC++ since the innermost @inner loop will be mapped to a sub-group. The new versions of the standards support sub-group barriers. I have opened a separate issue (#516 ) for this.

kris-rowe mentioned this issue Sep 14, 2021

Warp/sub-group barriers #516

Open

kris-rowe added the feature Use this label to request a new feature! label Sep 14, 2021

kris-rowe mentioned this issue Dec 6, 2021

Add the @nobarrier attribute to stop barriers from being added automatically to @inner loop blocks. #544

Merged

kris-rowe closed this as completed in #544 Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to disable automatic barriers #484

Add option to disable automatic barriers #484

stgeke commented Feb 24, 2021

kris-rowe commented Sep 13, 2021

stgeke commented Sep 14, 2021

kris-rowe commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

stgeke commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

stgeke commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

kris-rowe commented Sep 14, 2021

Add option to disable automatic barriers #484

Add option to disable automatic barriers #484

Comments

stgeke commented Feb 24, 2021

kris-rowe commented Sep 13, 2021

stgeke commented Sep 14, 2021

kris-rowe commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

stgeke commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

stgeke commented Sep 14, 2021

noelchalmers commented Sep 14, 2021

kris-rowe commented Sep 14, 2021