Enable zero-copy `to_dataframe`

### What is your issue?

Calling `Dataset.to_dataframe()` currently always produces a memory copy of all arrays. This is definitely not optimal for all scenarios. We should make it possible to convert Xarray objects to Pandas objects without a memory copy.

This behavior may depend on Pandas version. As of 2.2, here are the relevant Pandas docs: https://pandas.pydata.org/docs/user_guide/copy_on_write.html

Here's the key point:

> **Constructors now copy NumPy arrays by default**
>
> The Series and DataFrame constructors will now copy NumPy array by default when not otherwise specified. This was changed to avoid mutating a pandas object when the NumPy array is changed inplace outside of pandas. You can set copy=False to avoid this copy.

When we construct DataFrames in Xarray, we do it like this

https://github.com/pydata/xarray/blob/d5f84dd1ef4c023cf2ea0a38866c9d9cd50487e7/xarray/core/dataset.py#L7386-L7388

Here's a minimal example

```python
import numpy as np
import xarray as xr
ds = xr.DataArray(np.ones(1_000_000), dims=('x',), name="foo").to_dataset()
df = ds.to_dataframe()
print(np.shares_memory(df.foo.values, ds.foo.values))  # -> False

# can see the memory locations
print(ds.foo.values.__array_interface__)
print(df.foo.values.__array_interface__)

# compare to this
df2 = pd.DataFrame(
    {
        "foo": ds.foo.values,
    },
    copy=False
)
np.shares_memory(df2.foo.values, ds.foo.values)  # -> True
```

### Solution

I propose we add a `copy` keyword option to `Dataset.to_dataframe()` (and similar for `DataArray`) which defaults to `False` (current behavior) but allows users to select `True` if that's what they want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable zero-copy `to_dataframe` #9792

What is your issue?

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	broadcasted_df = pd.DataFrame(
	dict(zip(non_extension_array_columns, data, strict=True)), index=index
	)

Uh oh!

Enable zero-copy to_dataframe #9792

Description

What is your issue?

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Enable zero-copy `to_dataframe` #9792