Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] handle unserializable user exception #44878

Merged
merged 11 commits into from
May 2, 2024

Conversation

hongchaodeng
Copy link
Member

@hongchaodeng hongchaodeng commented Apr 19, 2024

If I run the following script, core worker would fail to serialize the RayError and crash

(credit to @alexeykudinkin )

from time import sleep

import ray

ray.init()

@ray.remote(num_cpus=1)
class Callee:

    def bar(self):
        from tenacity import retry, stop_after_attempt

        @retry(stop=stop_after_attempt(1))
        def failing_method():
            raise ValueError("failed")

        failing_method()


@ray.remote(num_cpus=1)
class Caller:

    def __init__(self, h):
        self.callee = h

    def foo(self):
        ref = self.callee.bar.remote()
        ray.wait([ref])


callee = Callee.remote()
caller = Caller.remote(callee)

ray.wait([caller.foo.remote()])

sleep(1800)

This PR will fix this.

python/ray/exceptions.py Outdated Show resolved Hide resolved
@hongchaodeng hongchaodeng force-pushed the fix-rayerror branch 2 times, most recently from c5548c1 to 7ed9740 Compare April 27, 2024 02:27
@hongchaodeng hongchaodeng changed the title [core] handle unserializable RayError [core] handle unserializable user exception Apr 27, 2024
@hongchaodeng hongchaodeng force-pushed the fix-rayerror branch 3 times, most recently from d2dcfd3 to ffdf7ee Compare April 27, 2024 03:34
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests?

try:
pickle.dumps(self.cause)
except Exception as e:
err_msg = f"Unable to serialize exception ({self.cause.__class__}): {e}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original cause of RayTaskError ({self.cause.__class__}) is non-serializable: {e}. Overwriting the cause to be RayError ?

python/ray/exceptions.py Show resolved Hide resolved
@hongchaodeng
Copy link
Member Author

@jjyao Is there any existing tests on testing exceptions?

Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm

try:
pickle.dumps(self.cause)
except Exception as e:
err_msg = f"The original cause of RayTaskError ({self.cause.__class__}) is not serializable: {e}. Overwriting the cause to RayError." # noqa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be split into multiple lines to avoid # noqa

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hongchaodeng hongchaodeng force-pushed the fix-rayerror branch 3 times, most recently from dab4719 to f5f6d2e Compare April 30, 2024 21:38
@hongchaodeng hongchaodeng requested a review from a team as a code owner April 30, 2024 21:56
@hongchaodeng
Copy link
Member Author

@jjyao Test and doc added. PTAL!

@@ -27,6 +27,8 @@ field of the ``RayTaskError``.
:start-after: __task_exceptions_begin__
:end-before: __task_exceptions_end__

If the user's exception cannot be serialized, it would be converted to a RayError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why it ends with colon :.

and also maybe change to active tense, like:

If Ray cannot serialize the user's exception, it will convert the exception to a ``RayError``

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Missed some sample code before. Fixed

Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some attempts at some style corrections. Please fix if not correct.

doc/source/ray-core/fault_tolerance/tasks.rst Outdated Show resolved Hide resolved
python/ray/exceptions.py Outdated Show resolved Hide resolved
python/ray/exceptions.py Outdated Show resolved Hide resolved
python/ray/exceptions.py Outdated Show resolved Hide resolved
python/ray/tests/test_failure.py Outdated Show resolved Hide resolved
python/ray/tests/test_failure.py Outdated Show resolved Hide resolved
python/ray/tests/test_failure.py Outdated Show resolved Hide resolved
print(type(e.cause))
# <class 'ray.exceptions.RayError'>
print(e.cause)
# The original cause of RayTaskError (<class '__main__.UnserializableException'>) is not serializable: cannot pickle '_thread.lock' object. Overwriting the cause to RayError.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The original cause of RayTaskError (<class '__main__.UnserializableException'>) is not serializable: cannot pickle '_thread.lock' object. Overwriting the cause to RayError.
# The original cause of RayTaskError (<class '__main__.UnserializableException'>) isn't serializable: can't pickle '_thread.lock' object. Overwriting the cause to a RayError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is not my writings.
It's output from the program runs.
So let's not change it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just see you suggest changes on the code as well. I will update that and rerun the program to reflect it.

hongchaodeng and others added 8 commits May 1, 2024 10:24
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Hongchao Deng <hongchaodeng1@gmail.com>
Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
@hongchaodeng
Copy link
Member Author

hongchaodeng commented May 1, 2024

@jjyao Ready for review. PTAL

Comment on lines 30 to 36
If Ray can't serialize the user's exception, it converts the exception to a ``RayError``.

.. literalinclude:: ../doc_code/task_exceptions.py
:language: python
:start-after: __unserializable_exceptions_begin__
:end-before: __unserializable_exceptions_end__

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it down to after line 49. Currently its in the middle of subclass section.

)
logger.warning(err_msg)
self.cause = RayError(err_msg)
self.args = (function_name, traceback_str, self.cause, proctitle, pid, ip)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can just set self.args in the end of the constructor so we don't need to set it twice. basically move line 122 down to the end.

Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment

logger.warning(err_msg)
self.cause = RayError(err_msg)

self.args = (function_name, traceback_str, self.cause, proctitle, pid, ip)
Copy link
Collaborator

@jjyao jjyao May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the comments (line 119-121) to here as well since they are comments for self.args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
@jjyao jjyao merged commit dcbf195 into ray-project:master May 2, 2024
5 checks passed
@hongchaodeng hongchaodeng deleted the fix-rayerror branch May 2, 2024 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants