Skip to content

[SYCL] Guard access to memory objects with a mutex #5231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 9, 2022

Conversation

againull
Copy link
Contributor

No description provided.

@againull againull marked this pull request as ready for review January 4, 2022 20:45
@againull againull requested a review from a team as a code owner January 4, 2022 20:45
@@ -839,6 +839,10 @@ struct _pi_mem : _pi_object {
size_t Size;
};

// Protects accesses to all the non-const member variables. Exclusive access
// is required to modify any of these members.
std::shared_mutex Mutex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think eventually we may want to move the Mutex to base _pi_object as I expect every PI object needs this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to pi_object.

@@ -403,7 +403,6 @@ static bool PiDriverModuleProgramExtensionFound = false;
// write (as it is OK for multiple threads to read the map without locking it).

pi_result _pi_mem::addMapping(void *MappedTo, size_t Offset, size_t Size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comments in the header that the "this->Mutex" should be locked before calling this.
Maybe, in fact, inline these functions and remove these methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined these functions.

@@ -6001,6 +6008,10 @@ pi_result piEnqueueMemBufferCopy(pi_queue Queue, pi_mem SrcBuffer,
pi_event *Event) {
PI_ASSERT(SrcBuffer && DstBuffer, PI_INVALID_MEM_OBJECT);
PI_ASSERT(Queue, PI_INVALID_QUEUE);

std::scoped_lock Lock(Queue->PiQueueMutex, SrcBuffer->Mutex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "std::scoped_lock" will lock all mutex for exclusive use, which looks like unnecessary many times.
Can we adopt a more fine-grain approach, like this:

    std::shared_lock L1(SrcBuffer->Mutex, std::defer_lock);
    std::unique_lock L2(DstBuffer->Mutex, std::defer_lock);
    std::unique_lock L3(Queue->Mutex, std::defer_lock);
    std::scoped_lock lockall(L1, L2, L3);

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks, fixed.

@@ -3595,13 +3593,15 @@ pi_result piMemGetInfo(pi_mem Mem,
(void)ParamValueSize;
(void)ParamValue;
(void)ParamValueSizeRet;
std::shared_lock Guard(Mem->Mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you are right, removed for now. Can be added if necessary.

@@ -5762,6 +5770,7 @@ pi_result piEnqueueMemBufferRead(pi_queue Queue, pi_mem Src,
PI_ASSERT(Src, PI_INVALID_MEM_OBJECT);
PI_ASSERT(Queue, PI_INVALID_QUEUE);

std::scoped_lock Lock(Queue->PiQueueMutex, Src->Mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can only lock "Src" for shared use, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, fixed.

Comment on lines 402 to 403
// TODO:: In the following 2 methods we may want to distinguish read access vs.
// write (as it is OK for multiple threads to read the map without locking it).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -5805,8 +5792,6 @@ static pi_result enqueueMemCopyHelper(pi_command_type CommandType,
pi_event *Event, bool PreferCopyEngine) {
PI_ASSERT(Queue, PI_INVALID_QUEUE);
PI_ASSERT(Event, PI_INVALID_EVENT);
// Lock automatically releases when this goes out of scope.
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust the comment at line 5798

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted.

@@ -5868,9 +5853,6 @@ static pi_result enqueueMemCopyRectHelper(
PI_ASSERT(Region && SrcOrigin && DstOrigin && Queue, PI_INVALID_VALUE);
PI_ASSERT(Event, PI_INVALID_EVENT);

// Lock automatically releases when this goes out of scope.
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust the comment at 5859

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted.

Comment on lines 5989 to 5992
std::shared_lock SrcLock(SrcBuffer->Mutex, std::defer_lock);
std::unique_lock DstLock(DstBuffer->Mutex, std::defer_lock);
std::unique_lock QueueLock(Queue->Mutex, std::defer_lock);
std::scoped_lock LockAll(SrcLock, DstLock, QueueLock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shorter, although I am not sure if it reads better:

Suggested change
std::shared_lock SrcLock(SrcBuffer->Mutex, std::defer_lock);
std::unique_lock DstLock(DstBuffer->Mutex, std::defer_lock);
std::unique_lock QueueLock(Queue->Mutex, std::defer_lock);
std::scoped_lock LockAll(SrcLock, DstLock, QueueLock);
std::shared_lock SrcLock(SrcBuffer->Mutex, std::defer_lock);
std::scoped_lock LockAll(SrcLock, DstBuffer->Mutex, Queue->Mutex);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, used shorter version.

@@ -6317,7 +6337,7 @@ pi_result piEnqueueMemUnmap(pi_queue Queue, pi_mem MemObj, void *MappedPtr,

{
// Lock automatically releases when this goes out of scope.
std::lock_guard<std::mutex> lock(Queue->PiQueueMutex);
std::scoped_lock lock(Queue->Mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::scoped_lock lock(Queue->Mutex);
std::shared_lock lock(Queue->Mutex);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, fixed.

@@ -6326,6 +6346,7 @@ pi_result piEnqueueMemUnmap(pi_queue Queue, pi_mem MemObj, void *MappedPtr,
}
}

std::shared_lock Guard(MemObj->Mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "memcpy" below potentially modifies the data of the MemObj, right?

Suggested change
std::shared_lock Guard(MemObj->Mutex);
std::scoped_lock Guard(MemObj->Mutex);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, fixed.

@@ -206,6 +206,10 @@ struct _pi_object {
// Level Zero doesn't do the reference counting, so we have to do.
// Must be atomic to prevent data race when incrementing/decrementing.
std::atomic<pi_uint32> RefCount;

// Protects accesses to all the non-const member variables. Exclusive access
// is required to modify any of these members.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add examples here how to properly lock things (including multiple objects, and read/write), so that people use the same syntax more or less.

  2. I think we should add a mode where single-threaded app doesn't pay the overhead of working with many mutexes. Please see how this can get organized. Maybe wrap the "std::shared_mutex" into a class that would redefine mutex operations to no-op under some env var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added examples and single-threaded mode.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments

@againull againull requested a review from a team as a code owner February 23, 2022 23:32
// A single-threaded app has an opportunity to enable this mode to avoid
// overhead from mutex locking.
static const bool SingleThreadMode = [] {
return std::getenv("SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE") != nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we distinguish "0" as disabling ST mode (it it convenient to switch between 0 and 1 when debugging)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just few comments about new env var

@againull
Copy link
Contributor Author

againull commented Feb 24, 2022

Failed Tests (1): SYCL :: Assert/assert_in_simultaneous_kernels.cpp

Where can I find the logs?

I was able to reproduce this failure by running make check-sycl-all multiple times, it can't be reproduced by running a single test or by single make check run.
Log is here: https://github.com/intel/llvm/runs/5312116716?check_suite_focus=true

smaslov-intel
smaslov-intel previously approved these changes Feb 24, 2022
smaslov-intel
smaslov-intel previously approved these changes Feb 24, 2022
@againull
Copy link
Contributor Author

Please don't merge this PR yet, I am double checking SYCL :: Assert/assert_in_simultaneous_kernels.cpp failure.

@againull againull changed the title [SYCL] Guard access to memory objects with a mutex [DO NOT MERGE][SYCL] Guard access to memory objects with a mutex Feb 25, 2022
@againull
Copy link
Contributor Author

againull commented Feb 25, 2022

Actually, it looks like I can reproduce Assert/assert_in_simultaneous_kernels.cpp failure only with this PR. But it requires running make check-sycl-all many times. I have no idea how to analyze such type of failure. Please let me know if you have any ideas.

It doesn't look to me that these changes themselves can cause a failure. More likely these changes change timing and this makes some existing problem to appear more often. I don't have an idea how to prove that.

@smaslov-intel
Copy link
Contributor

Actually, it looks like I can reproduce Assert/assert_in_simultaneous_kernels.cpp failure only with this PR. But it requires running make check-sycl-all many times. I have no idea how to analyze such type of failure. Please let me know if you have any ideas.

It doesn't look to me that these changes themselves can cause a failure. More likely these changes change timing and this makes some existing problem to appear more often. I don't have an idea how to prove that.

I suggest you try https://www.intel.com/content/www/us/en/developer/tools/oneapi/inspector.html to pinpoint the issue,

@bader bader marked this pull request as draft March 3, 2022 11:43
@bader bader changed the title [DO NOT MERGE][SYCL] Guard access to memory objects with a mutex [SYCL] Guard access to memory objects with a mutex Mar 3, 2022
@bader
Copy link
Contributor

bader commented Mar 3, 2022

Moved to a draft - drafts can't be merged. Artur, please, convert to a pull request when it's ready for merge.

@againull againull marked this pull request as ready for review March 8, 2022 21:33
@againull
Copy link
Contributor Author

againull commented Mar 8, 2022

Fixed Assert/assert_in_simultaneous_kernels.cpp regression by this commit: 786dc4b

@againull
Copy link
Contributor Author

againull commented Mar 9, 2022

@intel/dpcpp-doc-reviewers Could you please review doc changes.

Copy link
Contributor

@pvchupin pvchupin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc ok

@againull againull merged commit 3321141 into intel:sycl Mar 9, 2022
againull added a commit to againull/llvm that referenced this pull request Mar 11, 2022
@againull againull deleted the mem_mutex branch December 3, 2022 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants