Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only use CopyOnWriteArray wrapper on BackendArrays #8712

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/internals/how-to-add-new-backend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,9 @@ That implies that all the reference to open files should be dropped. For
opening files, we therefore suggest to use the helper class provided by Xarray
:py:class:`~xarray.backends.CachingFileManager`.

Note that any ``BackendArray`` subclass will also be protected such that any attempt
to write to this array only writes to a copy of whatever might still be stored on disk.

.. _RST indexing:

Indexing examples
Expand Down
9 changes: 8 additions & 1 deletion xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,9 +236,16 @@ def _protect_dataset_variables_inplace(dataset, cache):
for name, variable in dataset.variables.items():
if name not in dataset._indexes:
# no need to protect IndexVariable objects
data = indexing.CopyOnWriteArray(variable._data)

if isinstance(variable._data, backends.BackendArray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not right to me. We are in backends/api.py these should all be BackendArrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that BackendArray is presented in How to add a new backend strongly implies that using BackendArray is optional. In the KerchunkArray case I specifically don't want to use BackendArray, even though I do want to write a backend (from #8714 (comment)):

my KerchunkArray case is interesting because I don't want to use BackendArray (I have no use for CopyOnWrite because I'm never loading bytes, nor for Lazy indexing (I can't index into the KerchunkArray at all).

# only need to protect arrays that were lazy-loaded from disk
data = indexing.CopyOnWriteArray(variable._data)
else:
data = variable._data

if cache:
data = indexing.MemoryCachedArray(data)

variable.data = data


Expand Down
7 changes: 7 additions & 0 deletions xarray/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,13 @@ def _wrap_numpy_scalars(array):


class CopyOnWriteArray(ExplicitlyIndexedNDArrayMixin):
"""
Ensures that an attempt to write to this array only writes to a copy
of whatever might still be stored on disk.

Used to wrap any array that inherits from BackendArray.
"""

__slots__ = ("array", "_copied")

def __init__(self, array):
Expand Down
Loading