Open
Description
What happened + What you expected to happen
When a placement group is removed, and a new future is scheduled for an actor that lived on this placement group, ray.get hangs forever on this future and does print any error.
Versions / Dependencies
Latest master
Reproduction script
import time
import ray
@ray.remote
class Worker:
def sleep(self, seconds):
time.sleep(seconds)
return 4
ray.init(num_cpus=4)
pg = ray.util.placement_group([{"CPU": 4}])
ray.get(pg.ready())
pg2 = ray.util.placement_group([{"CPU": 4}])
fut = pg2.ready()
actor = Worker.options(placement_group=pg).remote()
sleep = actor.sleep.remote(10)
start = time.time()
ray.util.remove_placement_group(pg)
ray.get(fut)
end = time.time() - start
print("Took", end)
# Hangs forever
ray.get(actor.sleep.remote(5))
print("Over")
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Activity