Add lazy backend ASV test #7426

Illviljan · 2023-01-06T22:01:26Z

This tests xr.open_dataset without any slow file reading that can quickly become the majority of the performance time.

Related to #7374.

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

From the IOReadCustomEngine test we can see that chunking datasets with many variables (2000+) is considerably slower.

Illviljan · 2023-01-09T21:09:05Z

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

dcherian · 2023-01-12T15:59:44Z

asv_bench/benchmarks/dataset_io.py

+        xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)
+
+
+class IOReadCustomEngine:


Thanks this is a great benchmark.

Just a minor question: Shall we stick this in xarray.tests instead? I'm not sure if we have something similar for our tests already.

* main: (41 commits) v2023.01.0 whats-new (pydata#7440) explain keep_attrs in docstring of apply_ufunc (pydata#7445) Add sentence to open_dataset docstring (pydata#7438) pin scipy version in doc environment (pydata#7436) Improve performance for backend datetime handling (pydata#7374) fix typo (pydata#7433) Add lazy backend ASV test (pydata#7426) Pull Request Labeler - Workaround sync-labels bug (pydata#7431) see also : groupby in resample doc and vice-versa (pydata#7425) Some alignment optimizations (pydata#7382) Make `broadcast` and `concat` work with the Array API (pydata#7387) remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416) [pre-commit.ci] pre-commit autoupdate (pydata#7402) Preserve original dtype when accessing MultiIndex levels (pydata#7393) [pre-commit.ci] pre-commit autoupdate (pydata#7389) [pre-commit.ci] pre-commit autoupdate (pydata#7360) COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361) Avoid loading entire dataset by getting the nbytes in an array (pydata#7356) `keep_attrs` for pad (pydata#7267) Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375) ...

Introduced by 17933e7 / pydata#7426.

* Branches of the `if` statement have similar implementation Issue introduced by 16b53ac / #8937. It looks like the `sys.version_info >= (3, 11)` test is incorrect, as `typing.TypeAlias` was added in version 3.10, not 3.11: https://docs.python.org/3/library/typing.html#typing.TypeAlias * Branches of the `if` statement have similar implementation Introduced by 17933e7 / #7426.

Update dataset_io.py

445cb18

github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Jan 6, 2023

Illviljan added run-benchmark Run the ASV benchmark workflow and removed run-benchmark Run the ASV benchmark workflow labels Jan 6, 2023

Illviljan added 2 commits January 6, 2023 23:30

Update dataset_io.py

9ec407f

Update dataset_io.py

561f136

Illviljan marked this pull request as draft January 6, 2023 23:56

Illviljan added 5 commits January 8, 2023 10:37

move _skip_slow to setup

2faf47d

Add timing for all engines.

f50f276

Update dataset_io.py

4adf32a

Update dataset_io.py

d103635

Update dataset_io.py

0c1f320

Illviljan marked this pull request as ready for review January 9, 2023 20:52

Illviljan added the plan to merge Final call for comments label Jan 9, 2023

Illviljan added 3 commits January 9, 2023 22:13

Update dataset_io.py

73c8210

Update dataset_io.py

25a4559

Update dataset_io.py

8cd49e8

Illviljan removed topic-performance run-benchmark Run the ASV benchmark workflow labels Jan 10, 2023

Illviljan closed this Jan 10, 2023

Illviljan reopened this Jan 10, 2023

github-actions bot added the topic-performance label Jan 10, 2023

Illviljan added the run-benchmark Run the ASV benchmark workflow label Jan 10, 2023

Merge branch 'main' into open_dataset_performance

cd9fae4

Illviljan mentioned this pull request Jan 10, 2023

Pull Request Labeler - Workaround sync-labels bug #7431

Merged

Illviljan merged commit 17933e7 into pydata:main Jan 11, 2023

Illviljan added the topic-backends label Jan 11, 2023

dcherian reviewed Jan 12, 2023

View reviewed changes

DimitriPapadopoulos added a commit to DimitriPapadopoulos/xarray that referenced this pull request Jun 30, 2025

Branches of the if statement have similar implementation

77f48e8

Introduced by 17933e7 / pydata#7426.

DimitriPapadopoulos added a commit to DimitriPapadopoulos/xarray that referenced this pull request Jun 30, 2025

Branches of the if statement have similar implementation

69ba406

Introduced by 17933e7 / pydata#7426.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add lazy backend ASV test #7426

Add lazy backend ASV test #7426

Uh oh!

Illviljan commented Jan 6, 2023 •

edited

Loading

Uh oh!

Illviljan commented Jan 9, 2023 •

edited

Loading

Uh oh!

dcherian Jan 12, 2023 •

edited

Loading

Uh oh!

Uh oh!

		xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)


		class IOReadCustomEngine:

Uh oh!

Add lazy backend ASV test #7426

Add lazy backend ASV test #7426

Uh oh!

Conversation

Illviljan commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Illviljan commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Illviljan commented Jan 6, 2023 •

edited

Loading

Illviljan commented Jan 9, 2023 •

edited

Loading

dcherian Jan 12, 2023 •

edited

Loading