Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Move some inline defs to avoid deserialization errors #5228

Merged
merged 2 commits into from
Jul 19, 2019

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jul 19, 2019

What do these changes do?

I'm still not sure why the definition order of these classes matter, or why the protobuf refactoring PR causes it to matter. However, this PR does empirically fix the IMPALA crashes when remote_worker_envs=True and the number of workers is overloaded on the machine.

Related issue number

Closes #5125

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@ericl ericl requested a review from richardliaw July 19, 2019 00:18
@ericl
Copy link
Contributor Author

ericl commented Jul 19, 2019

The only thing I can think of is perhaps the import thread timing compared to task execution is different after the GCS pr (e.g., tasks get executed before the import finishes).

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15490/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1717/
Test FAILed.

@ericl ericl merged commit 28e5c55 into ray-project:master Jul 19, 2019
edoakes pushed a commit to edoakes/ray that referenced this pull request Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import thread error when new workers are started (regression from 0.7.1 -> 0.7.2)
3 participants