-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance when open zarr file with numpy>2.0.0 #9545
Comments
this looks like an issue with time decoding, with It appears that with OutOfBoundsTimedelta: Cannot cast 45757440 from m to 'ns' without overflow. (for reference, I wonder if this has anything to do with the changed dtype casting rules in cc @spencerkclark, in case you have any insight here |
Thanks @keewis for taking a look—was this with xarray |
actually, no. Let me retry with |
I can confirm that Thanks for the well-written report, @renaudjester. (as an additional comment, the link you've been using is actually on xr.open_dataset(
"s3://mdl-arco-time-035/arco/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-cur-rean-h_202012/timeChunked.zarr",
engine="zarr",
storage_options={"endpoint_url": "https://s3.waw3-1.cloudferro.com", "anon": True},
) |
Great, thanks for confirming @keewis. |
Thanks a lot :D Super, I will wait for the next release then! @keewis Thanks for the tips! Could you just point out to me why this is a bit more efficient? |
as far as I can tell (and I'm by no means an expert on this), the S3 protocol is a REST API. This means that while it is possible to talk to it using just HTTP vocabulary, it doesn't allow you to be as precise when requesting data, so you'll have some overhead. |
What happened?
Hi!
I want to open a zarr dataset lazily.
On my computer:
With
numpy==1.26.4
it takes around 1.5secWith
numpy==2.1.1
it takes around 5secIt's also slow on an ubuntu machine.
Unfortunately, I don't really have the time to deep dive into the issue and pinpoint exactly what is the piece of code that takes much more time than before. As little as I tested, it doesn't seem to come from the http calls.
What did you expect to happen?
I expect that the time to lazily open the dataset is the same whatever the numpy version.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.1
scipy: None
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.9.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.0
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: None
None
xarray: 2024.9.0
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.9.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.0
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: None
None
The text was updated successfully, but these errors were encountered: