[Core] zero-copy doesn't seem to give arrays that share memory space #48961

Joshuaalbert · 2024-11-27T16:59:41Z

What happened + What you expected to happen

The simplest way to see if zero copy has been performed is to check if the memory space is shared by the original and materialised object. This is false for a simple test case. Is my test incorrect? If so, how can I verify?

Versions / Dependencies

ray 2.39

Reproduction script

import ray
import numpy as np

if __name__ == '__main__':

    ray.init(address='local')

    original_array = np.random.rand(10, 10)

    ray_ref = ray.put(original_array)

    retrieved_array = ray.get(ray_ref)

    # Verify zero-copy
    if np.shares_memory(original_array, retrieved_array):
        print("Zero-copy confirmed!")
    else:
        print("Data was copied!")

Issue Severity

High: It blocks me from completing my task.

auderson · 2024-12-03T09:07:34Z

I guess when you put, ray makes a copy, but ray get is Zero-copy.

Joshuaalbert · 2024-12-03T22:40:54Z

@auderson is this a guess, or can you say for sure this is the case? If it's true then Ray isn't zero-copy in a meaningful sense. It means there is still a memory allocation required, which is effectively a copy. The copy just happens before the data is shipped ... Contrast this to using something like SharedMemory, which genuinely uses shared memory space. It would be cool if Ray did the same for same-host numpy array access, maybe with some extra machinery around it to protect it from mutation or something. I can imagine it's not really a priority for the average case, but when dealing with large arrays this can mean a big deal. I came across this behaviour actually when I kept running out of memory in a big HPC project trying to use Ray to orchestrate.

For those interested, this is an implementation of a shared memory array that you can use to send numpy arrays without copy between actors on the same host. It would fail if it went off-host though. You could probably put some extra machinery into it to deal with inter-host operation using the Ray object store.

from multiprocessing import shared_memory
from typing import Tuple

import numpy as np


class SharedData:
    """
    Shared memory data for single machine array sharing between processes. This class is picklable.
    """

    def __reduce__(self):
        # Return the class method for deserialization and the actor as an argument
        return (self._deserialise, (self._serialised_data,))

    @classmethod
    def _deserialise(cls, kwargs):
        return cls(**kwargs)

    def __init__(self, create: bool, shape: Tuple[int, ...], dtype: str, shm_name: str | None = None):
        if create:
            if shm_name is not None:
                raise ValueError(f"Expected `shm_name` to be None when `create` is True.")
            size = int(np.prod(shape)) * np.dtype(dtype).itemsize
            self._shm = shared_memory.SharedMemory(create=True, size=size)
            self._shared_arr = np.ndarray(shape, dtype=dtype, buffer=self._shm.buf)
            self._shared_arr[:] = 0.
            self._serialised_data = dict(create=False, shape=shape, dtype=dtype, shm_name=self._shm.name)
        else:
            if shm_name is None:
                raise ValueError(f"Expected `shm_name`.")
            self._shm = shared_memory.SharedMemory(name=shm_name)
            self._shared_arr = np.ndarray(shape=shape, dtype=dtype, buffer=self._shm.buf)
            self._serialised_data = dict(create=False, shape=shape, dtype=dtype, shm_name=shm_name)

    @property
    def shm_name(self) -> str:
        return self._shm.name

    def close(self):
        self._shm.close()

    def unlink(self):
        self._shm.unlink()

    def __getitem__(self, item):
        return self._shared_arr[item]

    def __setitem__(self, key, value):
        self._shared_arr[key] = value

jjyao · 2024-12-09T22:09:16Z

What @auderson said is correct. We now have zero copy read but not write.

jjyao · 2024-12-09T22:10:03Z

Make it a P1 enhancement to track supporting zero copy write.

Joshuaalbert added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 27, 2024

jcotant1 added the core Issues that should be addressed in Ray Core label Nov 27, 2024

jjyao added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks and removed bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] zero-copy doesn't seem to give arrays that share memory space #48961

[Core] zero-copy doesn't seem to give arrays that share memory space #48961

Joshuaalbert commented Nov 27, 2024

auderson commented Dec 3, 2024

Joshuaalbert commented Dec 3, 2024

jjyao commented Dec 9, 2024

jjyao commented Dec 9, 2024

[Core] zero-copy doesn't seem to give arrays that share memory space #48961

[Core] zero-copy doesn't seem to give arrays that share memory space #48961

Comments

Joshuaalbert commented Nov 27, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

auderson commented Dec 3, 2024

Joshuaalbert commented Dec 3, 2024

jjyao commented Dec 9, 2024

jjyao commented Dec 9, 2024