Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD][Backend, Atomics] Remove old comment regarding buffer atomics #5523

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SamGinzburg
Copy link
Contributor

@SamGinzburg SamGinzburg commented Jan 3, 2025

Overview

I spent a day investigating an old comment in the AMD backend regarding buffer atomics (specifically buffer atomic RMW) which suggested that buffer atomics perform better on MI-* series GPUs. The branch I worked on is available here for reference.

I think that (today, maybe this was different in the past) as far as I can tell the only benefit that buffer atomics (on their own) offer is the same hardware masking benefits that other buffer ops have (i.e., atomics out of bounds are dropped). This depends on how the buffer atomics are synchronized though, and this is where I'm less sure of whats correct:

With full barriers (identical performance to "normal" atomics):

buffer_wbl2 sc1
s_waitcnt lgkmcnt(0)
atomicRMWop()
s_waitcnt vmcnt(0) 
buffer_inv sc1
buffer_wbl2 sc1
s_waitcnt lgkmcnt(0)
atomicRMWop()
s_waitcnt vmcnt(0) 
buffer_inv sc1

With some assumptions specific to gfx942 (This can improve latency in bw-bound Split-K for some but not all shapes, it seems this is what CK does):

buffer_wbl2 sc1
s_waitcnt lgkmcnt(0)
atomicRMWop()
s_waitcnt vmcnt(0) # AMDGCN specific cross-CU synchronization primitive
atomicRMWop()
s_waitcnt vmcnt(0) 
buffer_inv sc1

If there is interest, I can file a PR to add experimental support for buffer atomic RMW ops. The only major difference with my implementation is that I added support for CTA-scope (atomics in the AMD backend today only support agent scope as far as I can tell I think).

==================================================================

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • [ x] I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • [ x] This PR does not need a test because just removing a comment.
  • Select one of the following.

    • [ x] I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

@SamGinzburg
Copy link
Contributor Author

@giuseros @makslevental

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant