Differences in `to_netcdf` for dask and numpy backed arrays

### What is your issue?

I make use of `fsspec` to quickly open netcdf files in the cloud and pull out slices of data without needing to read the entire file. Quick and dirty is just `ds = xr.open_dataset(fs.open("gs://..."))`.

This works great, in that a many GB file can be lazy-loaded as a dataset in a few hundred milliseconds, by only parsing the netcdf headers with under-the-hood byte range requests. **But**, only if the netcdf is written from dask-backed arrays. Somehow, writing from numpy-backed arrays produces a different netcdf that requires reading deeper into the file to parse as a dataset.

I spent some time digging into the backends and see xarray is ultimately passing off the store write to `dask.array` [here](https://github.com/pydata/xarray/blob/main/xarray/backends/common.py#L172). A look at `ncdump` and `Dataset.encoding` didn't reveal any obvious differences between these files, but there is clearly something. Anyone know why the straight xarray store methods would produce a different netcdf structure, despite the underlying data and encoding being identical?

This should work as an MCVE:
```python
import os
import string
import fsspec
import numpy as np
import xarray as xr

fs = fsspec.filesystem("gs")
bucket = "gs://<your-bucket>"

# create a ~160MB dataset with 20 variables
variables = {v: (["x", "y"], np.random.random(size=(1000, 1000))) for v in string.ascii_letters[:20]}
ds = xr.Dataset(variables)

# Save one version from numpy backed arrays and one from dask backed arrays
ds.compute().to_netcdf("numpy.nc")
ds.chunk().to_netcdf("dask.nc")

# Copy these to a bucket of your choice
fs.put("numpy.nc", bucket)
fs.put("dask.nc", bucket)
```

Then time reading in these files as datasets with fsspec:
```python
%timeit xr.open_dataset(fs.open(os.path.join(bucket, "numpy.nc")))
# 2.15 s ± 40.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

```python
%timeit xr.open_dataset(fs.open(os.path.join(bucket, "dask.nc")))
# 187 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Differences in `to_netcdf` for dask and numpy backed arrays #7522

What is your issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Differences in to_netcdf for dask and numpy backed arrays #7522

Description

What is your issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Differences in `to_netcdf` for dask and numpy backed arrays #7522