Improve performance for backend datetime handling#7374
Improve performance for backend datetime handling#7374Illviljan merged 34 commits intopydata:mainfrom
Conversation
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
…into mypy_conventions
for more information, see https://pre-commit.ci
xarray/conventions.py
Outdated
| if ( | ||
| decode_coords | ||
| and "coordinates" in attributes | ||
| and isinstance(attributes["coordinates"], str) | ||
| ): | ||
| attributes = dict(attributes) | ||
| coord_names.update(attributes.pop("coordinates").split()) | ||
| crds = attributes.pop("coordinates") | ||
| coord_names.update(crds.split()) |
There was a problem hiding this comment.
This previously would've crashed when trying to use attrs["coordinates"].split() on a non-string value.
Might be a functionality change? Should this raise an error instead if it's not a string?
There was a problem hiding this comment.
Yes it would be good to raise a nice error. "coordinates" is expected to be a string with variable names separated by spaces: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#attribute-appendix
Function is about 18% faster without this check.
…into mypy_conventions
Reduces time from 450ms -> 290ms from my open_dataset testing.
headtr1ck
left a comment
There was a problem hiding this comment.
+100 for all the typing :)
Do we need some benchmark for this?
for more information, see https://pre-commit.ci
…into mypy_conventions
|
Benchmark are improved if I understand the logs correctly. Unfortunately not significant enough to make ASV report it though. The ratio has to be >1.5 and the improvements on .time_open_dataset are around 1.3-1.4. |
|
Looks like an improvement on my machine
|
|
Great, thanks! |
* main: (41 commits) v2023.01.0 whats-new (pydata#7440) explain keep_attrs in docstring of apply_ufunc (pydata#7445) Add sentence to open_dataset docstring (pydata#7438) pin scipy version in doc environment (pydata#7436) Improve performance for backend datetime handling (pydata#7374) fix typo (pydata#7433) Add lazy backend ASV test (pydata#7426) Pull Request Labeler - Workaround sync-labels bug (pydata#7431) see also : groupby in resample doc and vice-versa (pydata#7425) Some alignment optimizations (pydata#7382) Make `broadcast` and `concat` work with the Array API (pydata#7387) remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416) [pre-commit.ci] pre-commit autoupdate (pydata#7402) Preserve original dtype when accessing MultiIndex levels (pydata#7393) [pre-commit.ci] pre-commit autoupdate (pydata#7389) [pre-commit.ci] pre-commit autoupdate (pydata#7360) COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361) Avoid loading entire dataset by getting the nbytes in an array (pydata#7356) `keep_attrs` for pad (pydata#7267) Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375) ...
Was hunting some low-hanging performance fruits when reading in files.
whats-new.rstapi.rst