-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Fix ServeHandle serialization #13695
[Serve] Fix ServeHandle serialization #13695
Conversation
Tests pass on my laptop and on Mac CI, but on Linux CI |
@@ -226,7 +226,7 @@ def batcher(requests): | |||
query_param = make_request_param() | |||
my_batch_sizes = await asyncio.gather(*[( | |||
await router.assign_request.remote(query_param)) for _ in range(3)]) | |||
assert my_batch_sizes == [2, 2, 1] | |||
assert sorted(my_batch_sizes) == [1, 2, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make sure to revert this and merge master again
Hey @edoakes @architkulkarni this failed on Windows
https://github.com/ray-project/ray/runs/1779615703 So I'm reverting it now. Can you open a re-revert either fixing (seems unlikely, access violation could be port conflict but this seems hard to debug) or skip? |
So weird, thanks for catching this, and my fault for not checking the Windows CI. It looks like this PR definitely caused this issue. I'm a little confused though, why does the run say that all 6/6 test_handle.py tests pass? And why does it say "FAIL: //python/ray/serve:test_router" but there's no such line for test_handle? Googling "Windows fatal exception: access violation" turns up a handful of pytest-related results, but also several that are not pytest-related. The only idea I have for how to try to fix this is to use |
In the log https://github.com/ray-project/ray/runs/1779615703 I see a line above the handle tests saying "TIMEOUT: //python/ray/serve:test_handle", but I can't figure out which test is timing out since all 6/6 tests pass in each of the three runs. The "Windows fatal exception" seems to be printed 1-2 times after each of the six tests, based on the endpoint names that are being printed. |
)" (ray-project#13753)" This reverts commit 405c8d7.
This reverts commit b522b1a.
Why are these changes needed?
Adds
__reduce__
methods to ServeHandle andThreadProxiedRouter
to fix serialization of ServeHandles. Upon deserialization, the constructor is simply called again, and a new router is started.Related issue number
Closes #13180
Checks
scripts/format.sh
to lint the changes in this PR.