Skip to content

[core] Scheduling a future on an actor with removed placement group hangs forever #28450

Open
@krfricke

Description

@krfricke

What happened + What you expected to happen

When a placement group is removed, and a new future is scheduled for an actor that lived on this placement group, ray.get hangs forever on this future and does print any error.

Versions / Dependencies

Latest master

Reproduction script

import time
import ray


@ray.remote
class Worker:
    def sleep(self, seconds):
        time.sleep(seconds)
        return 4


ray.init(num_cpus=4)

pg = ray.util.placement_group([{"CPU": 4}])
ray.get(pg.ready())
pg2 = ray.util.placement_group([{"CPU": 4}])
fut = pg2.ready()


actor = Worker.options(placement_group=pg).remote()

sleep = actor.sleep.remote(10)

start = time.time()
ray.util.remove_placement_group(pg)
ray.get(fut)
end = time.time() - start
print("Took", end)
# Hangs forever
ray.get(actor.sleep.remote(5))
print("Over")

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalapi-bugBug in which APIs behavior is wrongbugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray Corecore-placement-groupcore-schedulersize-medium

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions