Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] zero-copy doesn't seem to give arrays that share memory space #48961

Open
Joshuaalbert opened this issue Nov 27, 2024 · 4 comments
Open
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@Joshuaalbert
Copy link

What happened + What you expected to happen

The simplest way to see if zero copy has been performed is to check if the memory space is shared by the original and materialised object. This is false for a simple test case. Is my test incorrect? If so, how can I verify?

Versions / Dependencies

ray 2.39

Reproduction script

import ray
import numpy as np

if __name__ == '__main__':

    ray.init(address='local')

    original_array = np.random.rand(10, 10)

    ray_ref = ray.put(original_array)

    retrieved_array = ray.get(ray_ref)

    # Verify zero-copy
    if np.shares_memory(original_array, retrieved_array):
        print("Zero-copy confirmed!")
    else:
        print("Data was copied!")

Issue Severity

High: It blocks me from completing my task.

@Joshuaalbert Joshuaalbert added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 27, 2024
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Nov 27, 2024
@auderson
Copy link

auderson commented Dec 3, 2024

I guess when you put, ray makes a copy, but ray get is Zero-copy.

@Joshuaalbert
Copy link
Author

@auderson is this a guess, or can you say for sure this is the case? If it's true then Ray isn't zero-copy in a meaningful sense. It means there is still a memory allocation required, which is effectively a copy. The copy just happens before the data is shipped ... Contrast this to using something like SharedMemory, which genuinely uses shared memory space. It would be cool if Ray did the same for same-host numpy array access, maybe with some extra machinery around it to protect it from mutation or something. I can imagine it's not really a priority for the average case, but when dealing with large arrays this can mean a big deal. I came across this behaviour actually when I kept running out of memory in a big HPC project trying to use Ray to orchestrate.

For those interested, this is an implementation of a shared memory array that you can use to send numpy arrays without copy between actors on the same host. It would fail if it went off-host though. You could probably put some extra machinery into it to deal with inter-host operation using the Ray object store.

from multiprocessing import shared_memory
from typing import Tuple

import numpy as np


class SharedData:
    """
    Shared memory data for single machine array sharing between processes. This class is picklable.
    """

    def __reduce__(self):
        # Return the class method for deserialization and the actor as an argument
        return (self._deserialise, (self._serialised_data,))

    @classmethod
    def _deserialise(cls, kwargs):
        return cls(**kwargs)

    def __init__(self, create: bool, shape: Tuple[int, ...], dtype: str, shm_name: str | None = None):
        if create:
            if shm_name is not None:
                raise ValueError(f"Expected `shm_name` to be None when `create` is True.")
            size = int(np.prod(shape)) * np.dtype(dtype).itemsize
            self._shm = shared_memory.SharedMemory(create=True, size=size)
            self._shared_arr = np.ndarray(shape, dtype=dtype, buffer=self._shm.buf)
            self._shared_arr[:] = 0.
            self._serialised_data = dict(create=False, shape=shape, dtype=dtype, shm_name=self._shm.name)
        else:
            if shm_name is None:
                raise ValueError(f"Expected `shm_name`.")
            self._shm = shared_memory.SharedMemory(name=shm_name)
            self._shared_arr = np.ndarray(shape=shape, dtype=dtype, buffer=self._shm.buf)
            self._serialised_data = dict(create=False, shape=shape, dtype=dtype, shm_name=shm_name)

    @property
    def shm_name(self) -> str:
        return self._shm.name

    def close(self):
        self._shm.close()

    def unlink(self):
        self._shm.unlink()

    def __getitem__(self, item):
        return self._shared_arr[item]

    def __setitem__(self, key, value):
        self._shared_arr[key] = value

@jjyao
Copy link
Collaborator

jjyao commented Dec 9, 2024

What @auderson said is correct. We now have zero copy read but not write.

@jjyao jjyao added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks and removed bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 9, 2024
@jjyao
Copy link
Collaborator

jjyao commented Dec 9, 2024

Make it a P1 enhancement to track supporting zero copy write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants