-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix actor garbage collection by breaking cyclic references #1064
Fix actor garbage collection by breaking cyclic references #1064
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Could we add the example in the issue as a test (or anything similar)?
@@ -118,6 +118,7 @@ def _pid_alive(pid): | |||
""" | |||
try: | |||
os.kill(pid, 0) | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of this return statement was a bug meaning that any calls to wait_for_pid_to_exit
always succeeded. I'm fixing it here so it can be used in the actor test.
Merged build finished. Test FAILed. |
Test FAILed. |
Ok, there was a kind of subtle bug. Basically, when a worker/actor dies, that potentially frees up CPU resources, so we need to attempt to dispatch some more tasks. So I added a call to |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
5ed46cd
to
f2c02d8
Compare
Just rebased, hopefully that will fix the jenkins tests. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Somehow this is causing the jenkins |
Merged build finished. Test PASSed. |
Test PASSed. |
@@ -44,7 +44,7 @@ def driver(redis_address, driver_index): | |||
for i in range(driver_index - max_concurrent_drivers + 1): | |||
_wait_for_event("DRIVER_{}_DONE".format(i), redis_address) | |||
|
|||
def try_to_create_actor(actor_class, timeout=100): | |||
def try_to_create_actor(actor_class, timeout=500): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, this test seems to fail unless we increase the timeout here. I still don't see why this PR affects this test.
In a follow up PR, I'll try releasing GPU resources as soon as an actor exits and see if that speeds up this test.
Another possibility is that the monitor is the bottleneck because it has to process so many drivers that are exiting.
#902 introduced a cyclic reference between an
ActorHandle
instance and its dictionary ofActorMethod
instances. This replaces the backreference to the actor handle with a weak reference, so that the Python garbage collector can safely collect anActorHandle
.Fixes #1060.