Skip to content

[SYCL] USM shared memory allocator for L0 plugin #2366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 9, 2020

Conversation

ilyastepykin
Copy link
Contributor

@ilyastepykin ilyastepykin commented Aug 25, 2020

In L0 each allocation results in at least a memory page
no matter how small the size was requested. This adds
significant overhead when an app does many small allocations.

This patch adds a memory allocator on top of L0 USM API in
order to solve the problem.

High level description:
There is a predefine list of bucket sizes. When allocation function
is called, the best fitted bucket is found for the requested size.
If there is a free chunk in the bucket then it's returned, otherwise
a new slab of size 64k requested from the system via L0 API.
This slab is split into chunks of the size of corresponding bucket.
Now there is a free chunk in the slab which is returned.
When a chunk is returned it's marked as non-free until the user
call free with a corresponding pointer.

In addition to that add an environment variable SYCL_PI_LEVEL0_DISABLE_USM_ALLOCATOR
which disables the allocator and returns back to the original behavior.

Signed-off-by: Ilya Stepykin ilya.stepykin@intel.com

@ilyastepykin ilyastepykin requested review from smaslov-intel and a team as code owners August 25, 2020 09:39
@pvchupin pvchupin requested a review from jbrodman August 25, 2020 14:48
jbrodman
jbrodman previously approved these changes Aug 25, 2020
return Sizes;
}

static const std::vector<size_t> BucketSizes = generateBucketSizes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems values of this vector can be computed at compile time. Can we replace it with std::array then?
Non trivial global objects can cause problems during destruction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, unfortunately I had to hardcode all the sizes instead of writing the loop due to some limitations of std::array(e.g. lack of constexpr operator[] for non-const std::array in C++14 , etc).

@ilyastepykin ilyastepykin force-pushed the istepyki/usm-allocator branch from 49a6201 to ea836e3 Compare September 3, 2020 19:33
@ilyastepykin ilyastepykin force-pushed the istepyki/usm-allocator branch from ea836e3 to d31de69 Compare September 3, 2020 20:00
@ilyastepykin ilyastepykin changed the title [WIP][SYCL] USM shared memory allocator for L0 plugin [SYCL] USM shared memory allocator for L0 plugin Sep 3, 2020
@ilyastepykin ilyastepykin force-pushed the istepyki/usm-allocator branch from d31de69 to bf4ab17 Compare September 3, 2020 23:16
In L0 each allocation results in at least a memory page
no matter how small the size was requested. This adds
significant overhead when an app does many small allocations.

This patch adds a memory allocator on top of L0 USM API in
order to solve the problem.

High level description:
There is a predefine list of bucket sizes. When allocation function
is called, the best fitted bucket is found for the requested size.
If there is a free chunk in the bucket then it's returned, otherwise
a new slab of size 64k requested from the system via L0 API.
This slab is split into chunks of the size of corresponding bucket.
Now there is a free chunk in the slab which is returned.
When a chunk is returned it's marked as non-free until the user
call free with a corresponding pointer.

In addition to that add an environment variable
SYCL_PI_LEVEL0_DISABLE_USM_ALLOCATOR which disables the allocator
and returns back to the original behavior.

Signed-off-by: Ilya Stepykin <ilya.stepykin@intel.com>
@ilyastepykin ilyastepykin force-pushed the istepyki/usm-allocator branch from bf4ab17 to 769d3b9 Compare September 4, 2020 21:52
@smaslov-intel
Copy link
Contributor

@ilyastepykin , because of you "force-pushed" the update changes the incremental review is not possible, please avoid doing so in future.

@@ -174,6 +228,12 @@ struct _pi_context : _pi_object {
// and destroy the pool if there are no alive events.
ze_result_t decrementAliveEventsInPool(ze_event_pool_handle_t pool);

// Store USM allocator context(internal allocator structures)
// for USM shared/host and device allocations. There is 1 allocator context
// per each pair of (context, device) per each memory type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there should be separate allocators for device and shared USM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have different allocation function: zeMemAllocShared vs zeMemAllocDevice.
Technically, it's possible to have a single allocator for both of them, but it requires to differentiate and keep track of allocation type. It basically makes the core logic more complex without any advantages of the current approach where we keep allocator per allocation type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, keeping track of them separately is good.

@smaslov-intel
Copy link
Contributor

The support of multiple devices was added: https://github.com/intel/llvm/pull/2343/files, though, it was tested on OpenCL only. @smaslov-intel, for the purposes of LIT tests, can Level Zero emulate multiple devices?

@dm-vodopyanov : the #2343 does not add the support for contexts with multiple devices to Level-Zero plugin, so no, it is not possible to test that with Level-Zero yet.

smaslov-intel
smaslov-intel previously approved these changes Sep 7, 2020
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There are issues with running these tests on OpenCL
on windows.
The primary goal of these tests is to test USM allocator
which is currently emabled only for Level zero, so disable
them on OpenCL for now.
Due to incorrect pointer alignment in case when
allcation alignment > size we returned to the user
and incorrect chunk(i.e. next to the one which was
marked as non-free) which is later detected as "double free"
error.

Also corrected the existing testcases to test this scenario
@jbrodman
Copy link
Contributor

jbrodman commented Sep 8, 2020

LGTM, but there are a couple of places @romanovvlad wanted initialization to zero that haven't been changed yet.

@pvchupin
Copy link
Contributor

pvchupin commented Sep 8, 2020

Ok for docs.

pvchupin
pvchupin previously approved these changes Sep 8, 2020
Copy link
Contributor

@pvchupin pvchupin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix typo.

Co-authored-by: Pavel Chupin <45979248+pvchupin@users.noreply.github.com>
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc part OK

@pvchupin pvchupin merged commit db5037c into intel:sycl Sep 9, 2020
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Disable CFI checks when ASAN is specified
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants