-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different approach to removing RayGetError #3471
Conversation
Example output:
|
This is really cool! Here is a self-contained example:
|
The |
@robertnishihara to address the concern about suppressing worker exceptions, I added a bit of a delay for those. If the driver does not raise a task error before the delay expires, then we go ahead and print out the worker errors. Otherwise, they are suppressed. This should make it so that in the common case the right thing happens:
|
Test FAILed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
@ericl: Can you update test_actor_creation_node_failure, see https://travis-ci.com/ray-project/ray/jobs/163807495
|
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be working, can be merged when the test is fixed.
Test FAILed. |
Test FAILed. |
Somehow this is causing the Raylet to abort, but only in Python 3 builds. python/ray/tune/test/trial_runner_test.py::TrialRunnerTest::testFailureRecoveryMaxFailures /Users/travis/.travis/job_stages: line 104: 8753 Abort trap: 6 python -m pytest -v python/ray/tune/test/trial_runner_test.py travis_time:end:0d09cbc8:start=1544353726673758000,finish=1544353983522119000,duration=256848361000 |
Test FAILed. |
Test PASSed. |
Possibly fixes #1885. |
What do these changes do?
Revise #3224