Add scoped atomic_thread_fence #1644

vchuravy · 2022-10-23T00:32:12Z

As noticed in EnzymeAD/Enzyme.jl#511 CUDA C++ support a wider selection of memory orders
and emits different assembly for SM_70 and above: https://godbolt.org/z/Y7Pj5G7sK

For now I just added the memory fences necessary to implement the rest.

@tkf @maleadt over the long-term I would be in favor of moving to Atomix.jl instead of CUDA.@atomic
is there any shared infrastructure we can use? As you see I am defining scope and order here again.

src/device/intrinsics/atomics.jl

codecov · 2022-11-05T20:26:17Z

Codecov Report

Base: 61.68% // Head: 60.08% // Decreases project coverage by -1.60% ⚠️

Coverage data is based on head (654870d) compared to base (dcb175e).
Patch has no changes to coverable lines.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1644      +/-   ##
==========================================
- Coverage   61.68%   60.08%   -1.61%     
==========================================
  Files         151      151              
  Lines       11349    10833     -516     
==========================================
- Hits         7001     6509     -492     
+ Misses       4348     4324      -24

Impacted Files	Coverage Δ
lib/cusolver/linalg.jl	`49.71% <0.00%> (-36.73%)`	⬇️
lib/cublas/CUBLAS.jl	`52.25% <0.00%> (-24.07%)`	⬇️
lib/cusparse/conversions.jl	`79.77% <0.00%> (-14.31%)`	⬇️
lib/cusparse/interfaces.jl	`58.55% <0.00%> (-13.77%)`	⬇️
src/compiler/gpucompiler.jl	`82.14% <0.00%> (-11.20%)`	⬇️
lib/cusparse/level3.jl	`63.85% <0.00%> (-10.85%)`	⬇️
lib/cusparse/broadcast.jl	`25.92% <0.00%> (-10.35%)`	⬇️
lib/cusparse/types.jl	`41.89% <0.00%> (-8.11%)`	⬇️
lib/cusparse/generic.jl	`89.30% <0.00%> (-7.87%)`	⬇️
src/utilities.jl	`68.57% <0.00%> (-7.75%)`	⬇️
... and 71 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jgreener64 · 2023-01-27T14:24:05Z

Does this need something else before it can be merged as an intermediate solution? I am finding it useful to get Atomix.@atomic :monotonic working with Enzyme on GPU (JuliaConcurrent/Atomix.jl#33 and EnzymeAD/Enzyme.jl#511).

vchuravy · 2023-01-27T16:03:59Z

This shouldn't work with Enzyme, since Enzyme won't understand the assembly inserted, that's one of the reasons I haven't pushed on this further.

jgreener64 · 2023-01-27T17:56:55Z

It seems to work with EnzymeAD/Enzyme.jl#511 and in another context I tried, unless I am getting something mixed up or you mean it will only work in specific cases.

Add scoped atomic_thread_fence

be8f95a

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Oct 24, 2022

maleadt reviewed Oct 26, 2022

View reviewed changes

src/device/intrinsics/atomics.jl Outdated Show resolved Hide resolved

vchuravy added 3 commits November 5, 2022 14:51

fixup! Add scoped atomic_thread_fence

5c2fee1

fixup! fixup! Add scoped atomic_thread_fence

adbd7d6

Mock out load and store

62d0977

vchuravy added 5 commits November 5, 2022 16:27

Finish load and store

7d80632

Add todos

74278ea

stop the copy-pasta

2da2386

exch and cas

e1d259b

fixes

654870d

vchuravy mentioned this pull request Jan 11, 2023

Instruction selection error from loads and stores on CUDA JuliaConcurrent/Atomix.jl#33

Open

jgreener64 mentioned this pull request Jan 13, 2023

CUDA.@atomic error in GPU kernel EnzymeAD/Enzyme.jl#511

Open

vchuravy mentioned this pull request Mar 10, 2023

Use Atomix #1790

Draft

4 tasks

maleadt force-pushed the master branch from 476979e to d53a63e Compare March 16, 2023 12:34

maleadt force-pushed the master branch from c97bc77 to d57e020 Compare September 8, 2023 20:12

maleadt force-pushed the master branch from 1cb1f53 to 1a1d127 Compare September 18, 2023 16:28

maleadt force-pushed the master branch from aef3298 to 4b017c6 Compare January 18, 2024 12:09

maleadt force-pushed the master branch 7 times, most recently from ea2a305 to 2274085 Compare December 19, 2024 17:45

maleadt force-pushed the master branch 8 times, most recently from 5d585c4 to c850163 Compare December 20, 2024 08:18

vchuravy mentioned this pull request Jan 13, 2025

Atomics: configurable scope (for multi-device unified memory) #2619

Open

vchuravy mentioned this pull request May 13, 2025

Generalized GPU support JuliaConcurrent/Atomix.jl#57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add scoped atomic_thread_fence #1644

Add scoped atomic_thread_fence #1644

Uh oh!

vchuravy commented Oct 23, 2022 •

edited by maleadt

Loading

Uh oh!

Uh oh!

codecov bot commented Nov 5, 2022 •

edited

Loading

Uh oh!

jgreener64 commented Jan 27, 2023

Uh oh!

vchuravy commented Jan 27, 2023

Uh oh!

jgreener64 commented Jan 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add scoped atomic_thread_fence #1644

Are you sure you want to change the base?

Add scoped atomic_thread_fence #1644

Uh oh!

Conversation

vchuravy commented Oct 23, 2022 • edited by maleadt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Nov 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jgreener64 commented Jan 27, 2023

Uh oh!

vchuravy commented Jan 27, 2023

Uh oh!

jgreener64 commented Jan 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vchuravy commented Oct 23, 2022 •

edited by maleadt

Loading

codecov bot commented Nov 5, 2022 •

edited

Loading