-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release Futures by sending del msgs when fetched #23172
Conversation
If we do this, is there any remaining reason to have both Futures and RemoteChannels? If we want to keep both types, it should probably be an error to |
The issue is internal (distributed gc) and not the interface. Broadly they really address different use cases. A A I don't subscribe to the idea of disallowing a Early release of the remote value on a fetch is a good optimization and I would be loathe to see it go. I think we can try another approach - release the remote value only a del_msg which we now send early - as soon as the value is fetched instead of waiting for the finalizer. Will update the PR. |
2001f51
to
102b1c9
Compare
102b1c9
to
5851f5f
Compare
Updated. |
I don't see how this fixes either scenario. As long as del_client can happen before a future is GC'd, it will be possible to send the future to another worker after the RemoteValue is deleted. It seems to me something here will have to give an error. |
Nope. Consider these statements executed in the same order.
To note: add_msgs are sent before del_msgs in flush_gc_msgs - the order is important. |
The existing bug is due to this flow and logic
|
I see, the fix is that |
This will be inefficient but will not result in an error condition. What will happen is that both the tasks will send a fetch request which will be processed by the remote node - multiple del_msgs follow later, only the first will be acted upon. The assumption of course is that both the tasks are on the same OS thread and that isnull(r.v) is not a yieldpoint (which it is not). This situation can be optimized in a separate PR. |
There is a possible schedule where Task A completes its whole fetch+send_del_client, after Task B tests |
Just to clarify - the reason it cannot happen is because there is no yield between Task B testing and sending a fetch request. The fetch call is a yieldpoint. On the wire, the messages will be in the following order (left to right). Send : Task A fetch <-- Task B fetch <--- Task A send_del_client <-- Task B send_del_client In the other case Task A fetch <-- Task A send_del_client <-- Task B finds !isnull and hence no request is sent |
On a unrelated note, all green CI - haven't seen this in quite a while! |
Wow, congrats on having the only pull request that's actually safe to merge! :) |
One other dependency is the order in which messages are processed on the remote node. Each message pulled from socket is wrapped in a new task and scheduled. As the current task scheduler just adds newly scheduled tasks to the end of the work queue, it is not currently an issue. |
Yes I agree; it probably works now but depends on things being ordered in the expected way elsewhere. Seems ok for now since it at least works better than before. |
0.5 introduced an optimization for distributed gc which
This does not cover the following 2 scenarios (reported in #23109).
and
In both cases the first async fetch results in the remote value being freed and hence the second async fetch hangs.
While the first scenario can be addressed by tracking multiple fetch calls on the same Future locally, the second scenario (which has the Future serialized to other nodes) cannot be addressed in the same manner.
This PR reverts the optimization and falls back to collection/finalization of Futures for distributed gc.Distributed gc of Futures now rely on an explicit del msg. However, this is now sent when the Future is fetched and is not dependent on local gc.
Closes #23109