-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix actor garbage collection by breaking cyclic references #1064
Changes from all commits
79132f5
3beabb7
3b54170
a685f0e
fabe9eb
f2c02d8
16fb51d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,6 +118,7 @@ def _pid_alive(pid): | |
""" | ||
try: | ||
os.kill(pid, 0) | ||
return True | ||
except OSError: | ||
return False | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,10 +30,10 @@ def check_ids(self): | |
|
||
|
||
def driver(redis_address, driver_index): | ||
"""The script for driver 0. | ||
"""The script for all drivers. | ||
|
||
This driver should create five actors that each use one GPU and some actors | ||
that use no GPUs. After a while, it should exit. | ||
This driver should create five actors that each use one GPU. After a while, | ||
it should exit. | ||
""" | ||
ray.init(redis_address=redis_address) | ||
|
||
|
@@ -44,7 +44,7 @@ def driver(redis_address, driver_index): | |
for i in range(driver_index - max_concurrent_drivers + 1): | ||
_wait_for_event("DRIVER_{}_DONE".format(i), redis_address) | ||
|
||
def try_to_create_actor(actor_class, timeout=100): | ||
def try_to_create_actor(actor_class, timeout=500): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some reason, this test seems to fail unless we increase the timeout here. I still don't see why this PR affects this test. In a follow up PR, I'll try releasing GPU resources as soon as an actor exits and see if that speeds up this test. Another possibility is that the monitor is the bottleneck because it has to process so many drivers that are exiting. |
||
# Try to create an actor, but allow failures while we wait for the | ||
# monitor to release the resources for the removed drivers. | ||
start_time = time.time() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of this return statement was a bug meaning that any calls to
wait_for_pid_to_exit
always succeeded. I'm fixing it here so it can be used in the actor test.