[AMDGPU] Handle direct loads to LDS in memory model #142018

kerbowa · 2025-05-29T19:28:42Z

Add additional waitcnt insertion to ensure proper ordering between LDS
operations and direct loads from global memory to LDS on pre-GFX10
hardware.

Direct LDS loads perform both a global memory load and an LDS store,
which can be reordered with respect to other LDS operations without
explicit synchronization. This can cause ordering violations even within
a single thread.

The change conservatively inserts vmcnt(0) waits for all sync scopes
when the LDS address space is involved. Future optimizations in
SIInsertWaitcnts can relax this to only wait for outstanding direct LDS
loads rather than all vmcnt events.

This change only affects LDS address space synchronization and preserves
existing cross-address space ordering behavior.

kerbowa · 2025-05-29T19:28:57Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[AMDGPU] Handle direct loads to LDS in memory model #142018 👈 (View in Graphite)
[AMDGPU] Optimize LDS DMA soft waitcnt #138802
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-05-29T19:31:15Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Add additional waitcnt insertion to ensure proper ordering between LDS operations and direct loads from global memory to LDS on pre-GFX10 hardware. Direct LDS loads perform both a global memory load and an LDS store, which can be reordered with respect to other LDS operations without explicit synchronization. This can cause ordering violations even within a single thread. The change conservatively inserts vmcnt(0) waits for all sync scopes when the LDS address space is involved. Future optimizations in SIInsertWaitcnts can relax this to only wait for outstanding direct LDS loads rather than all vmcnt events. This change only affects LDS address space synchronization and preserves existing cross-address space ordering behavior.

kerbowa mentioned this pull request May 29, 2025

[AMDGPU] Optimize LDS DMA soft waitcnt #138802

Open

kerbowa force-pushed the users/kerbowa/direct-lds-load-memory-legalizer branch from c5c5225 to fdaccc9 Compare June 4, 2025 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Handle direct loads to LDS in memory model #142018

[AMDGPU] Handle direct loads to LDS in memory model #142018

Uh oh!

kerbowa commented May 29, 2025

Uh oh!

kerbowa commented May 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

[AMDGPU] Handle direct loads to LDS in memory model #142018

Are you sure you want to change the base?

[AMDGPU] Handle direct loads to LDS in memory model #142018

Uh oh!

Conversation

kerbowa commented May 29, 2025

Uh oh!

kerbowa commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kerbowa commented May 29, 2025 •

edited

Loading

github-actions bot commented May 29, 2025 •

edited

Loading