- 
                Notifications
    
You must be signed in to change notification settings  - Fork 255
 
Add scoped atomic_thread_fence #1644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
          Codecov ReportBase: 61.68% // Head: 60.08% // Decreases project coverage by  
 Additional details and impacted files@@            Coverage Diff             @@
##           master    #1644      +/-   ##
==========================================
- Coverage   61.68%   60.08%   -1.61%     
==========================================
  Files         151      151              
  Lines       11349    10833     -516     
==========================================
- Hits         7001     6509     -492     
+ Misses       4348     4324      -24     
 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov.  | 
    
| 
           Does this need something else before it can be merged as an intermediate solution? I am finding it useful to get   | 
    
| 
           This shouldn't work with Enzyme, since Enzyme won't understand the assembly inserted, that's one of the reasons I haven't pushed on this further.  | 
    
| 
           It seems to work with EnzymeAD/Enzyme.jl#511 and in another context I tried, unless I am getting something mixed up or you mean it will only work in specific cases.  | 
    
ea2a305    to
    2274085      
    Compare
  
    5d585c4    to
    c850163      
    Compare
  
    
As noticed in EnzymeAD/Enzyme.jl#511 CUDA C++ support a wider selection of memory orders
and emits different assembly for SM_70 and above: https://godbolt.org/z/Y7Pj5G7sK
For now I just added the memory fences necessary to implement the rest.
@tkf @maleadt over the long-term I would be in favor of moving to Atomix.jl instead of
CUDA.@atomicis there any shared infrastructure we can use? As you see I am defining scope and order here again.