-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xray] Fix bug in updating actor execution dependencies #2064
[xray] Fix bug in updating actor execution dependencies #2064
Conversation
Test FAILed. |
@@ -662,7 +665,7 @@ void NodeManager::FinishAssignedTask(Worker &worker) { | |||
auto actor_notification = std::make_shared<ActorTableDataT>(); | |||
actor_notification->actor_id = actor_id.binary(); | |||
actor_notification->actor_creation_dummy_object_id = | |||
task.GetTaskSpecification().ActorCreationDummyObjectId().binary(); | |||
task.GetTaskSpecification().ActorDummyObject().binary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this correct before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActorCreationDummyObjectId
is only set for actor methods, not for the actor creation task. :(
src/ray/raylet/node_manager.cc
Outdated
mutable_spec.SetExecutionDependencies( | ||
{actor_entry->second.GetExecutionDependency()}); | ||
auto execution_dependency = actor_entry->second.GetExecutionDependency(); | ||
if (!execution_dependency.is_nil()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execution_dependency.is_nil()
is true only for the very first actor task that is assigned, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, left a couple questions.
jenkins retest this please |
src/ray/raylet/node_manager.cc
Outdated
if (!execution_dependency.is_nil()) { | ||
TaskExecutionSpecification &mutable_spec = task.GetTaskExecutionSpec(); | ||
mutable_spec.SetExecutionDependencies( | ||
{actor_entry->second.GetExecutionDependency()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{execution_dependency}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Test PASSed. |
Test PASSed. |
d9e2789
to
b15c8ef
Compare
Test PASSed. |
* master: (22 commits) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086) [xray] Corrects Error Handling During Push and Pull. (ray-project#2059) [xray] Sophisticated task dependency management (ray-project#2035) Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081) [DataFrame] Improve performance of iteration methods (ray-project#2026) [DataFrame] Implement to_csv (ray-project#2014) [xray] Lineage cache only requests notifications about remote parent tasks (ray-project#2066) [rllib] Add magic methods for rollouts (ray-project#2024) [DataFrame] Allows DataFrame constructor to take in another DataFrame (ray-project#2072) Pin Pandas version for Travis to 0.22 (ray-project#2075) Fix python linting (ray-project#2076) [xray] Fix GCS table prefixes (ray-project#2065) Some tests for _submit API. (ray-project#2062) [rllib] Queue lib for python 2.7 (ray-project#2057) [autoscaler] Remove faulty assert that breaks during downscaling, pull configs from env (ray-project#2006) ...
* master: (24 commits) Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) Fix compilation error for RAY_USE_NEW_GCS with latest clang. (ray-project#2086) [xray] Corrects Error Handling During Push and Pull. (ray-project#2059) [xray] Sophisticated task dependency management (ray-project#2035) Support calling positional arguments by keyword (fix ray-project#998) (ray-project#2081) [DataFrame] Improve performance of iteration methods (ray-project#2026) ...
* fix-a3c-torch: (37 commits) Add missing channel major Use correct filter size Add TODO Fix shape errors fmt Performance fix (ray-project#2110) Use flake8-comprehensions (ray-project#1976) Improve error message printing and suppression. (ray-project#2104) [rllib] [doc] Broken link in ddpg doc YAPF, take 3 (ray-project#2098) [rllib] rename async -> _async (ray-project#2097) fix unused lambda capture (ray-project#2102) [xray] Use pubsub instead of timeout for ObjectManager Pull. (ray-project#2079) [DataFrame] Update _inherit_docstrings (ray-project#2085) [JavaWorker] Changes to the build system for support java worker (ray-project#2092) [xray] Fix bug in updating actor execution dependencies (ray-project#2064) [DataFrame] Refactor __delitem__ (ray-project#2080) [xray] Better error messaging when pulling from self. (ray-project#2068) Use source code in hash where possible (fix ray-project#2089) (ray-project#2090) Functions for flushing done tasks and evicted objects. (ray-project#2033) ...
What do these changes do?
Actor execution dependencies are used to schedule actor methods in the correct order. They are updated each time a method is assigned to reflect execution order. This fixes two bugs in the code to update the execution dependencies. The first updated the actor information before updating the execution dependency for a method, causing the method to become dependent on itself. The second did not clear previous execution dependencies when resetting them.