Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit buffer allocation #24

Open
TApplencourt opened this issue Jul 23, 2019 · 9 comments
Open

Explicit buffer allocation #24

TApplencourt opened this issue Jul 23, 2019 · 9 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@TApplencourt
Copy link
Contributor

TApplencourt commented Jul 23, 2019

Hi,

The handler class have copy and fill member functions. Can we have an allocate function? As the name implies, this function will allocate buffers [*]

The justification for this demand is the following:
Allocations may include synchronization between kernels. Hence users may want first to allocate buffers, then run their kernels.
This makes reasoning about concurrency easier (as far as I know explicit preemptive allocations is considered good practice in OpenMP world).

[*] It's possible to mimic the features right now by either using directly an
"OpenCL buffer" or by creating and launching a dummy "kernel". Both methods are kinda cumbersome.

Thanks

@keryell keryell added enhancement New feature or request question Further information is requested labels Jul 25, 2019
@keryell
Copy link
Member

keryell commented Jul 25, 2019

You mean having an allocate() function member on a cl::sycl::handler as a hint to do some pre-allocation on some device behind a cl::sycl:queue?

@TApplencourt
Copy link
Contributor Author

TApplencourt commented Jul 25, 2019

Exactly!
Something like:

 cl::sycl::buffer<cl::sycl::cl_int, 1> buffer(global_range);
 myQueue.submit([&](cl::sycl::handler& cgh) {
    cgh.allocate(buffer);
  });

We have cl::handle::copy to do explicit copy, it may seem natural to some people to have cl::handle::allocate to do explicit allocation of buffers.

@Ruyk
Copy link
Contributor

Ruyk commented Jul 25, 2019

This is covered, on a different way, on a set of proposals from Codeplay to complete the asynchronous data-flow model in SYCL, the particular about allocation is in
https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/sycl-2.2/04_update_on_specific_context.md

but we didn't have enough implementation experience for 1.2.1. The proposal above is synchronous w.r.t the user thread, but allowing it on the command group makes sense as well.

Rather than an explicit allocate functionality, I would go for a slightly high level approach:

cl::sycl::buffer<cl::sycl::cl_int, 1> buffer(global_range);
 myQueue.submit([&](cl::sycl::handler& cgh) {
    cgh.require(buffer, /* Optionally, also the access mode */);
  });

or, for completeness

cl::sycl::buffer<cl::sycl::cl_int, 1> buffer(global_range);
 myQueue.submit([&](cl::sycl::handler& cgh) {
    auto accA = buffer.get_access<access::mode::read>(cgh);
    cgh.require(accA);
  });

An explicit allocate function would force the SYCL runtime to actually allocate something on the context. However, the SYCL runtime may already have a mean to access the data in the context, for example, if two devices share the same global address space:

context c1(d1, d2);
queue q1{c1, d1};
queue q2{c1, d2};

cl::sycl::buffer<cl::sycl::cl_int, 1> buffer(global_range);
// Allocate object on device1, but same context so visible for d2 as well
 q1.submit([&](cl::sycl::handler& cgh) {
    cgh.allocate(buffer);
  });
// What is the meaning now of this allocate?
// Also, which allocate executes first?
 q2.submit([&](cl::sycl::handler& cgh) {
    cgh.allocate(buffer);
  });

Using a less explicit require will fit with the requirements/actions model we defined in the architecture section:

context c1(d1, d2);
queue q1{c1, d1};
queue q2{c1, d2};

cl::sycl::buffer<cl::sycl::cl_int, 1> buffer(global_range);
// Requires buffer to be available on d1, action to perform is, e.g. a clCreateBuffer(MEM_READ_WRITE)
 q1.submit([&](cl::sycl::handler& cgh) {
    cgh.require(buffer);
  });
// Requires buffer to be available on d2, requirement is satisfied, so no action to perform.
 q2.submit([&](cl::sycl::handler& cgh) {
    cgh.require(buffer);
  });

@TApplencourt
Copy link
Contributor Author

TApplencourt commented Jul 25, 2019

Thanks for your detailed answers! The proposal seems interesting.

An explicit allocate function would force the SYCL runtime to actually allocate something on the context.
[...]
Using a less explicit require will fit with the requirements/actions model we defined in the architecture section:

I don't fully understand this point.
If the buffer is already allocated, cannot we just say that we don't allocate it again? cl::handle::allocate will be more a "hint" (as @keryell suggest) than an "order". I was thinking, but maybe I'm wrong, that cl::handle::copy work this way. If the data was already be copied, we don't copy it again.

But I don't have any strong opinion about the naming convention. I prefer the name allocate because it's more explicit, but I understand your point and the confusion that can potentially occur, and I trust you to know it fits better the SYCL spirit!

If a user can have more control on when the allocation will occur, I'm happy :)

@Ruyk
Copy link
Contributor

Ruyk commented Sep 18, 2019

We are having some discussions internally about this changes.
Just a clarification on the text below:

I don't fully understand this point.
If the buffer is already allocated, cannot we just say that we don't allocate it again? cl::handle::allocate will be more a "hint" (as @keryell suggest) than an "order". I was thinking, but maybe I'm wrong, that cl::handle::copy work this way. If the data was already be copied, we don't copy it again.

The copy functions take an external pointer as an argument, so the runtime cannot know if the data has been copied or not, and must copy every time.
The only exception is update_host, which hints to the runtime that the host pointer of the buffer must be updated, but has a number of hidden side-effects that are unclear (like, for example, it can trigger a host allocation if there was no host copy of the data before).

@keryell keryell added the Agenda To be discussed during a SYCL committee meeting label Dec 5, 2019
@keryell keryell removed the Agenda To be discussed during a SYCL committee meeting label Jan 21, 2020
@illuhad
Copy link
Contributor

illuhad commented Jun 15, 2021

See also https://github.com/illuhad/hipSYCL/blob/develop/doc/extensions.md#hipsycl_ext_update_device for an extension in hipSYCL that provides similar functionality.

@fraggamuffin
Copy link

need time to evaluate update_device proposal
also useful for buffer usm interop and sycl SC

@fraggamuffin
Copy link

relax multiple commands so no command group with multiple copies
add prefetch buffer and treat that like one of the copies
might need a new ticket

@fraggamuffin
Copy link

close to converging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants