-
Notifications
You must be signed in to change notification settings - Fork 793
[SYCL] USM shared memory allocator for L0 plugin #2366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return Sizes; | ||
} | ||
|
||
static const std::vector<size_t> BucketSizes = generateBucketSizes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems values of this vector can be computed at compile time. Can we replace it with std::array then?
Non trivial global objects can cause problems during destruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, unfortunately I had to hardcode all the sizes instead of writing the loop due to some limitations of std::array(e.g. lack of constexpr operator[] for non-const std::array in C++14 , etc).
49a6201
to
ea836e3
Compare
ea836e3
to
d31de69
Compare
d31de69
to
bf4ab17
Compare
In L0 each allocation results in at least a memory page no matter how small the size was requested. This adds significant overhead when an app does many small allocations. This patch adds a memory allocator on top of L0 USM API in order to solve the problem. High level description: There is a predefine list of bucket sizes. When allocation function is called, the best fitted bucket is found for the requested size. If there is a free chunk in the bucket then it's returned, otherwise a new slab of size 64k requested from the system via L0 API. This slab is split into chunks of the size of corresponding bucket. Now there is a free chunk in the slab which is returned. When a chunk is returned it's marked as non-free until the user call free with a corresponding pointer. In addition to that add an environment variable SYCL_PI_LEVEL0_DISABLE_USM_ALLOCATOR which disables the allocator and returns back to the original behavior. Signed-off-by: Ilya Stepykin <ilya.stepykin@intel.com>
bf4ab17
to
769d3b9
Compare
@ilyastepykin , because of you "force-pushed" the update changes the incremental review is not possible, please avoid doing so in future. |
@@ -174,6 +228,12 @@ struct _pi_context : _pi_object { | |||
// and destroy the pool if there are no alive events. | |||
ze_result_t decrementAliveEventsInPool(ze_event_pool_handle_t pool); | |||
|
|||
// Store USM allocator context(internal allocator structures) | |||
// for USM shared/host and device allocations. There is 1 allocator context | |||
// per each pair of (context, device) per each memory type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there should be separate allocators for device and shared USM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have different allocation function: zeMemAllocShared vs zeMemAllocDevice.
Technically, it's possible to have a single allocator for both of them, but it requires to differentiate and keep track of allocation type. It basically makes the core logic more complex without any advantages of the current approach where we keep allocator per allocation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, keeping track of them separately is good.
@dm-vodopyanov : the #2343 does not add the support for contexts with multiple devices to Level-Zero plugin, so no, it is not possible to test that with Level-Zero yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There are issues with running these tests on OpenCL on windows. The primary goal of these tests is to test USM allocator which is currently emabled only for Level zero, so disable them on OpenCL for now.
Due to incorrect pointer alignment in case when allcation alignment > size we returned to the user and incorrect chunk(i.e. next to the one which was marked as non-free) which is later detected as "double free" error. Also corrected the existing testcases to test this scenario
LGTM, but there are a couple of places @romanovvlad wanted initialization to zero that haven't been changed yet. |
Ok for docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix typo.
Co-authored-by: Pavel Chupin <45979248+pvchupin@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc part OK
Disable CFI checks when ASAN is specified
In L0 each allocation results in at least a memory page
no matter how small the size was requested. This adds
significant overhead when an app does many small allocations.
This patch adds a memory allocator on top of L0 USM API in
order to solve the problem.
High level description:
There is a predefine list of bucket sizes. When allocation function
is called, the best fitted bucket is found for the requested size.
If there is a free chunk in the bucket then it's returned, otherwise
a new slab of size 64k requested from the system via L0 API.
This slab is split into chunks of the size of corresponding bucket.
Now there is a free chunk in the slab which is returned.
When a chunk is returned it's marked as non-free until the user
call free with a corresponding pointer.
In addition to that add an environment variable SYCL_PI_LEVEL0_DISABLE_USM_ALLOCATOR
which disables the allocator and returns back to the original behavior.
Signed-off-by: Ilya Stepykin ilya.stepykin@intel.com