Skip to content

[core] Scheduling a future on an actor with removed placement group hangs forever #28450

Open
@krfricke

Description

@krfricke

What happened + What you expected to happen

When a placement group is removed, and a new future is scheduled for an actor that lived on this placement group, ray.get hangs forever on this future and does print any error.

Versions / Dependencies

Latest master

Reproduction script

import time
import ray


@ray.remote
class Worker:
    def sleep(self, seconds):
        time.sleep(seconds)
        return 4


ray.init(num_cpus=4)

pg = ray.util.placement_group([{"CPU": 4}])
ray.get(pg.ready())
pg2 = ray.util.placement_group([{"CPU": 4}])
fut = pg2.ready()


actor = Worker.options(placement_group=pg).remote()

sleep = actor.sleep.remote(10)

start = time.time()
ray.util.remove_placement_group(pg)
ray.get(fut)
end = time.time() - start
print("Took", end)
# Hangs forever
ray.get(actor.sleep.remote(5))
print("Over")

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray Corecore-apicore-placement-groupcore-scheduler

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions