- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.8k
Closed
Description
I'm currently trying to implement actor reconstruction for xray, and did some research on how actor reconstruction works in legacy ray.
In the below script, when we try to get a missing result of a actor task, and the actor process is dead, I'd expect Ray to replay all tasks (including the actor creation task) and reconstruct the result.
import ray
import os
import signal
import time
import pyarrow.plasma as plasma
@ray.remote(checkpoint_interval=-1)
class MyActor(object):
    def __init__(self):
        print("Actor created" + str(os.getpid()))
        self.value = 0
    def increase(self):
        self.value += 1
        return self.value
    def get_pid(self):
        return os.getpid()
ray.init(use_raylet=False, num_workers=0)
actor = MyActor.remote()
ids = []
for _ in range(5):
    id = actor.increase.remote()
    ids.append(id)
    ray.get(id)
# Kill the actor process
id = actor.get_pid.remote()
ids.append(id)
pid = ray.get(id)
time.sleep(2)
print("killing %s" % pid)
os.kill(pid, signal.SIGKILL)
# Delete the previous results from object store.
ray.worker.global_worker.plasma_client.delete(
    [plasma.ObjectID(id.id()) for id in ids]
)
# Flush object store cache.
for _ in range(100):
    ray.put(1)
time.sleep(2)
print(ids)
print(ray.get(ids[-1]))However, local scheduler fails with this error:
F0913 20:57:53.068881 2784486272 local_scheduler_algorithm.cc:1589]  Check failed: algorithm_state->local_objects.count(object_id) == 0 fce1281dd6a115e7b90d55d875e62fee930e3916
*** Check failure stack trace: ***
    @        0x101e7c62a  google::LogMessage::Fail()
    @        0x101e7a44e  google::LogMessage::SendToLog()
    @        0x101e7b2cf  google::LogMessage::Flush()
    @        0x101e7b109  google::LogMessage::~LogMessage()
    @        0x101e7b3c5  google::LogMessage::~LogMessage()
    @        0x101e6632d  ray::RayLog::~RayLog()
    @        0x101e22933  handle_object_available()
    @        0x101e0a78b  process_plasma_notification()
    @        0x101e41221  aeProcessEvents
    @        0x101e4150b  aeMain
    @        0x101e12bf7  start_server()
    @        0x101e136c2  main
Metadata
Metadata
Assignees
Labels
No labels