Skip to content

[AsyncAlloc][SYCL][CUDA][Exp] Initial device side implementation for the sycl_ext_oneapi_async_memory_alloc extension #16900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Mar 27, 2025

Conversation

Seanst98
Copy link
Contributor

@Seanst98 Seanst98 commented Feb 6, 2025

Implement the sycl_ext_oneapi_async_memory_alloc extension for asynchronous memory allocation and freeing in CUDA, for device allocated pools only.

SYCL entrypoints which specify host or shared side pools, or pools created by pre-existing allocations will throw.

co-authored-by: Sean Stirling sean.stirling@codeplay.com
co-authored-by: Hugh Delaney hugh.delaney@codeplay.com

@Seanst98 Seanst98 force-pushed the sean/async-alloc branch 2 times, most recently from 30ed8bb to 82a106a Compare February 21, 2025 13:18
Copy link
Contributor

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, this LGTM.

@npmiller
Copy link
Contributor

@intel/llvm-gatekeepers I believe this is ready to merge

@martygrant martygrant merged commit faa2365 into intel:sycl Mar 27, 2025
31 checks passed
@uditagarwal97
Copy link
Contributor

Hi @Seanst98 ,
This PR broke nightly: https://github.com/intel/llvm/actions/runs/14121213165/job/39561695855

/__w/llvm/llvm/src/unified-runtime/source/adapters/cuda/usm.cpp: In constructor 'ur_usm_pool_handle_t_::ur_usm_pool_handle_t_(ur_context_handle_t, ur_device_handle_t, ur_usm_pool_desc_t*)':
/__w/llvm/llvm/src/unified-runtime/source/adapters/cuda/usm.cpp:434:20: error: 'CUmemPoolProps' {aka 'struct CUmemPoolProps_st'} has no member named 'maxSize'
  434 |       MemPoolProps.maxSize =
      |                    ^~~~~~~

This could be due to difference in CUDA version in pre-commit vs. Nightly. The Ubuntu 22 build job in Nightly uses CUDA 12.1, while Ubuntu 24 build job uses CUDA 12.6.3.
Could you please look into this?

@Seanst98
Copy link
Contributor Author

Hi @Seanst98 , This PR broke nightly: https://github.com/intel/llvm/actions/runs/14121213165/job/39561695855

Thanks for bringing this to my attention, I've pushed a PR which addresses this: #17733

sommerlukas pushed a commit that referenced this pull request Mar 31, 2025
This patch fixes a couple static analysis issues with the recent [async
alloc patch](#16900):

* Use `std::move` for shared pointer in `CGAsyncAlloc` constructor. It
is already passed by-value to the constructor so we can just move it
when assigning it to the member.
* Assert that the queue is available in `AsyncFree`
* Catch any exceptions from the memory pool destructor
* Initialize AsyncAlloc fields in handler
* Add `[[maybe_unused]]` for parameter only used in assert
@Seanst98 Seanst98 deleted the sean/async-alloc branch April 4, 2025 16:39
KornevNikita pushed a commit that referenced this pull request May 27, 2025
…the sycl_ext_oneapi_async_memory_alloc extension (#16900)

Implement the
[sycl_ext_oneapi_async_memory_alloc](#14800)
extension for asynchronous memory allocation and freeing in CUDA, for
device allocated pools only.

SYCL entrypoints which specify host or shared side pools, or pools
created by pre-existing allocations will throw.

co-authored-by: Sean Stirling <sean.stirling@codeplay.com>
co-authored-by: Hugh Delaney <hugh.delaney@codeplay.com>

---------

Co-authored-by: Hugh Delaney <hugh.delaney@codeplay.com>
Co-authored-by: Nicolas Miller <nicolas.miller@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.