Skip to content

Hang when object store full #4878

Closed
Closed
@ericl

Description

System information

Ray 0.7.0-dev

Describe the problem

The following workload will hang on test_get_parallel(n=5) with console messages like

pyarrow.lib.PlasmaStoreFull: object does not fit in the plasma store

(pid=23447) 2019-05-27 17:58:37,324	INFO worker.py:392 --
The object with ID ObjectID(c3e2fb3725ff2634d366070bb2e4689c01000000)
already exists in the object store.

That's not expected -- it should raise an error. Ideally it would run successfully, since the working set actually can fit in memory, but that's outside of the scope of this issue.

Source code / logs

import numpy as np
import time
import ray


@ray.remote
class Actor(object):
    def some_expensive_task(self):
        return np.zeros(25 * 1024 * 1024, dtype=np.uint8)


def test_get_serial():
    a = Actor.remote()
    i = 0
    start = time.time()
    for _ in range(100):
        ray.get(a.some_expensive_task.remote())
        i += 1
        if i % 10 == 0:
            print("Calls per second", i / (time.time() - start))
    a.__ray_terminate__.remote()


def test_get_parallel(n=10):
    i = 0
    actors = [Actor.remote() for _ in range(n)]
    start = time.time()
    for _ in range(100):
        pending = [a.some_expensive_task.remote() for a in actors]
        while pending:
            [done], pending = ray.wait(pending, num_returns=1)
            i += 1
            if i % 10 == 0:
                print("Calls per second", i / (time.time() - start))
    for a in actors:
        a.__ray_terminate__.remote()


if __name__ == "__main__":
    ray.init(object_store_memory=100 * 1024 * 1024)
    print("Test serial")
    test_get_serial()
    print("Test parallel 1")
    test_get_parallel(n=1)
    print("Test parallel 5")
    test_get_parallel(n=5)

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions