Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lazy backend ASV test #7426

Merged
merged 12 commits into from
Jan 11, 2023
Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jan 6, 2023

This tests xr.open_dataset without any slow file reading that can quickly become the majority of the performance time.

Related to #7374.

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

From the IOReadCustomEngine test we can see that chunking datasets with many variables (2000+) is considerably slower.

@github-actions github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Jan 6, 2023
@Illviljan Illviljan added run-benchmark Run the ASV benchmark workflow and removed run-benchmark Run the ASV benchmark workflow labels Jan 6, 2023
@Illviljan Illviljan marked this pull request as draft January 6, 2023 23:56
@Illviljan Illviljan marked this pull request as ready for review January 9, 2023 20:52
@Illviljan
Copy link
Contributor Author

Illviljan commented Jan 9, 2023

Timings for the new ASV-tests:


[ 50.85%] ··· dataset_io.IOReadCustomEngine.time_open_dataset                 ok
[ 50.85%] ··· ======== ============
               chunks              
              -------- ------------
                None     265±4ms   
                 {}     1.17±0.02s 
              ======== ============
[ 54.69%] ··· dataset_io.IOReadSingleFile.time_read_dataset                   ok
[ 54.69%] ··· ========= ============= =============
              --                   chunks          
              --------- ---------------------------
                engine       None           {}     
              ========= ============= =============
                scipy     4.81±0.1ms   6.65±0.01ms 
               netcdf4   8.41±0.08ms    10.9±0.2ms 
              ========= ============= =============

@Illviljan Illviljan added the plan to merge Final call for comments label Jan 9, 2023
@Illviljan Illviljan removed topic-performance run-benchmark Run the ASV benchmark workflow labels Jan 10, 2023
@Illviljan Illviljan closed this Jan 10, 2023
@Illviljan Illviljan reopened this Jan 10, 2023
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label Jan 10, 2023
xr.open_dataset(self.filepaths[engine], engine=engine, chunks=chunks)


class IOReadCustomEngine:
Copy link
Contributor

@dcherian dcherian Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is a great benchmark.

Just a minor question: Shall we stick this in xarray.tests instead? I'm not sure if we have something similar for our tests already.

dcherian added a commit to dcherian/xarray that referenced this pull request Jan 18, 2023
* main: (41 commits)
  v2023.01.0 whats-new (pydata#7440)
  explain keep_attrs in docstring of apply_ufunc (pydata#7445)
  Add sentence to open_dataset docstring (pydata#7438)
  pin scipy version in doc environment (pydata#7436)
  Improve performance for backend datetime handling (pydata#7374)
  fix typo (pydata#7433)
  Add lazy backend ASV test (pydata#7426)
  Pull Request Labeler - Workaround sync-labels bug (pydata#7431)
  see also : groupby in resample doc and vice-versa (pydata#7425)
  Some alignment optimizations (pydata#7382)
  Make `broadcast` and `concat` work with the Array API (pydata#7387)
  remove `numbagg` and `numba` from the upstream-dev CI (pydata#7416)
  [pre-commit.ci] pre-commit autoupdate (pydata#7402)
  Preserve original dtype when accessing MultiIndex levels (pydata#7393)
  [pre-commit.ci] pre-commit autoupdate (pydata#7389)
  [pre-commit.ci] pre-commit autoupdate (pydata#7360)
  COMPAT: Adjust CFTimeIndex.get_loc for pandas 2.0 deprecation enforcement (pydata#7361)
  Avoid loading entire dataset by getting the nbytes in an array (pydata#7356)
  `keep_attrs` for pad (pydata#7267)
  Bump pypa/gh-action-pypi-publish from 1.5.1 to 1.6.4 (pydata#7375)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-backends topic-performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants