Skip to content

[AMDGPU] Handle direct loads to LDS in memory model #142018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: users/kerbowa/direct-lds-load-memory-model-waits
Choose a base branch
from

Conversation

kerbowa
Copy link
Member

@kerbowa kerbowa commented May 29, 2025

Add additional waitcnt insertion to ensure proper ordering between LDS
operations and direct loads from global memory to LDS on pre-GFX10
hardware.

Direct LDS loads perform both a global memory load and an LDS store,
which can be reordered with respect to other LDS operations without
explicit synchronization. This can cause ordering violations even within
a single thread.

The change conservatively inserts vmcnt(0) waits for all sync scopes
when the LDS address space is involved. Future optimizations in
SIInsertWaitcnts can relax this to only wait for outstanding direct LDS
loads rather than all vmcnt events.

This change only affects LDS address space synchronization and preserves
existing cross-address space ordering behavior.

Copy link
Member Author

kerbowa commented May 29, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

github-actions bot commented May 29, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Add additional waitcnt insertion to ensure proper ordering between LDS
operations and direct loads from global memory to LDS on pre-GFX10
hardware.

Direct LDS loads perform both a global memory load and an LDS store,
which can be reordered with respect to other LDS operations without
explicit synchronization. This can cause ordering violations even within
a single thread.

The change conservatively inserts vmcnt(0) waits for all sync scopes
when the LDS address space is involved. Future optimizations in
SIInsertWaitcnts can relax this to only wait for outstanding direct LDS
loads rather than all vmcnt events.

This change only affects LDS address space synchronization and preserves
existing cross-address space ordering behavior.
@kerbowa kerbowa force-pushed the users/kerbowa/direct-lds-load-memory-legalizer branch from c5c5225 to fdaccc9 Compare June 4, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant