Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] release GIL when running parallel_memcopy() / memcpy() during serializations #22492

Merged
merged 1 commit into from
Feb 18, 2022

Conversation

mwtian
Copy link
Member

@mwtian mwtian commented Feb 18, 2022

Why are these changes needed?

While investigating #22161, it is observed GIL is held for an extended amount of time (up to 1000s) with stack trace [1]. It is possible either there are many iterations within Pickle5Writer.write_to() calling ray::parallel_memcopy(), or a few ray::parallel_memcopy() taking a long time (less likely). Either way, ray::parallel_memcopy() or std::memcpy() should not hold GIL.

The MessagePackSerializedObject / Pickle5SerializedObject / Pickle5Writer write_to() functions have been annotated with nogil, which allows calling the functions without holding GIL (the annotation itself does not release GIL). It might be possible to release GIL while calling write_to() in store_task_output(), but it seems riskier for cherrypicking and requires more testing.

[1]
Python stack:

Process 532: ray::IDLE
Python v3.7.7 (/home/ray/anaconda3/bin/python3.7)

Thread 0x7F0A5571F740 (active+gil): "MainThread"
    main_loop (ray/worker.py:453)
    <module> (ray/workers/default_worker.py:225)

Python + native stack:

Thread 532 (idle): "MainThread"
    __pthread_timedjoin_ex (libpthread-2.27.so)
    __gthread_join (gthr-default.h:682)
    std::thread::join (thread.cc:110)
    ray::parallel_memcopy (ray/_raylet.so)
    Pickle5Writer_write_to (ray/_raylet.so)
    Pickle5SerializedObject_write_to (ray/_raylet.so)
    MessagePackSerializedObject_write_to (ray/_raylet.so)
    CoreWorker_store_task_output (ray/_raylet.so)
    CoreWorker_store_task_outputs (ray/_raylet.so)
    _raylet_task_execution_handler (ray/_raylet.so)
    std::_Function_handler<ray::Status(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke (ray/_raylet.so)
    ray::core::CoreWorker::ExecuteTask (ray/_raylet.so)
    std::_Function_handler<ray::Status(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)::*)(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke (ray/_raylet.so)
    ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}::operator() const (ray/_raylet.so)
    std::_Function_handler<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>), ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}>::_M_invoke (ray/_raylet.so)
    ray::core::InboundRequest::Accept (ray/_raylet.so)
    ray::core::NormalSchedulingQueue::ScheduleRequests (ray/_raylet.so)
    EventTracker::RecordExecution (ray/_raylet.so)
    std::_Function_handler<void (), instrumented_io_context::post(std::function<void ()>, std::string)::{lambda()#1}>::_M_invoke (ray/_raylet.so)
    boost::asio::detail::completion_handler<std::function<void ()>, boost::asio::io_context::basic_executor_type<std::allocator<void>, (unsigned int)0> >::do_complete (ray/_raylet.so)
    boost::asio::detail::scheduler::do_run_one (ray/_raylet.so)
    boost::asio::detail::scheduler::run (ray/_raylet.so)
    boost::asio::io_context::run (ray/_raylet.so)
    ray::core::CoreWorker::RunTaskExecutionLoop (ray/_raylet.so)
    ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop (ray/_raylet.so)
    ray::core::CoreWorkerProcess::RunTaskExecutionLoop (ray/_raylet.so)
    run_task_loop (ray/_raylet.so)
    main_loop (ray/worker.py:453)
    <module> (ray/workers/default_worker.py:225)

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mwtian mwtian changed the title [WIP] release [Core] release GIL when running parallel_memcpy() / memcpy() during serializations Feb 18, 2022
@mwtian mwtian changed the title [Core] release GIL when running parallel_memcpy() / memcpy() during serializations [Core] release GIL when running parallel_memcopy() / memcpy() during serializations Feb 18, 2022
@mwtian mwtian added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 18, 2022
@mwtian
Copy link
Member Author

mwtian commented Feb 18, 2022

Test failures appeared on master as well.

@mwtian mwtian marked this pull request as ready for review February 18, 2022 16:04
MEMCOPY_THREADS)
else:
memcpy(&buffer[0], self.value_ptr, self._total_bytes)
with nogil:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you confirmed that nogil annotation doesn't do its job (that's surprising to me)?

Copy link
Member Author

@mwtian mwtian Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nogil annotation on function def tells cython it is ok to release gil while calling the function. with nogil: actually releases gil.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learned something new!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same! There seems to be only one sentence about this in cython doc.

@ericl ericl merged commit 5a4c6d2 into master Feb 18, 2022
@ericl ericl deleted the fix-gil branch February 18, 2022 22:11
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
…ing serializations (ray-project#22492)

While investigating ray-project#22161, it is observed GIL is held for an extended amount of time (up to 1000s) with stack trace [1]. It is possible either there are many iterations within `Pickle5Writer.write_to()` calling `ray::parallel_memcopy()`, or a few `ray::parallel_memcopy()` taking a long time (less likely). Either way, `ray::parallel_memcopy()` or `std::memcpy()` should not hold GIL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants