-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] release GIL when running parallel_memcopy()
/ memcpy()
during serializations
#22492
Conversation
parallel_memcpy()
/ memcpy()
during serializations
parallel_memcpy()
/ memcpy()
during serializationsparallel_memcopy()
/ memcpy()
during serializations
Test failures appeared on |
MEMCOPY_THREADS) | ||
else: | ||
memcpy(&buffer[0], self.value_ptr, self._total_bytes) | ||
with nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you confirmed that nogil annotation doesn't do its job (that's surprising to me)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nogil
annotation on function def tells cython
it is ok to release gil while calling the function. with nogil:
actually releases gil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learned something new!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same! There seems to be only one sentence about this in cython
doc.
…ing serializations (ray-project#22492) While investigating ray-project#22161, it is observed GIL is held for an extended amount of time (up to 1000s) with stack trace [1]. It is possible either there are many iterations within `Pickle5Writer.write_to()` calling `ray::parallel_memcopy()`, or a few `ray::parallel_memcopy()` taking a long time (less likely). Either way, `ray::parallel_memcopy()` or `std::memcpy()` should not hold GIL.
Why are these changes needed?
While investigating #22161, it is observed GIL is held for an extended amount of time (up to 1000s) with stack trace [1]. It is possible either there are many iterations within
Pickle5Writer.write_to()
callingray::parallel_memcopy()
, or a fewray::parallel_memcopy()
taking a long time (less likely). Either way,ray::parallel_memcopy()
orstd::memcpy()
should not hold GIL.The
MessagePackSerializedObject
/Pickle5SerializedObject
/Pickle5Writer
write_to()
functions have been annotated withnogil
, which allows calling the functions without holding GIL (the annotation itself does not release GIL). It might be possible to release GIL while callingwrite_to()
instore_task_output()
, but it seems riskier for cherrypicking and requires more testing.[1]
Python stack:
Python + native stack:
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.