Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset insertion benchmark #7223

Merged
merged 17 commits into from
Oct 27, 2022
Merged

Conversation

hmaarrfk
Copy link
Contributor

xref: #7221

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@github-actions github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Oct 26, 2022
asv_bench/benchmarks/dataset_creation.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/dataset_creation.py Outdated Show resolved Hide resolved
asv_bench/benchmarks/dataset_creation.py Outdated Show resolved Hide resolved
hmaarrfk and others added 6 commits October 26, 2022 12:19
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@hmaarrfk hmaarrfk changed the title Dataset creation benchmark Dataset insertion benchmark Oct 26, 2022
@hmaarrfk
Copy link
Contributor Author

ok naming is always hard. I tried to pick a good name.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

asv_bench/benchmarks/dataset_in_memory_operation.py Outdated Show resolved Hide resolved
@dcherian dcherian added the plan to merge Final call for comments label Oct 26, 2022
import xarray as xr


class DatasetAddVariable:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, i was going to expand this to other in-memory operations. Replace a variable? Drop a variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the first version used was relevant as well:

ds = xr.Dataset(coords=coords)
for v in names:
    ds[v] = ("time", value)

And this fast one from #7224 (comment) would also be nice to benchmark:

data_vars = {v: ("time", value) for v in names}
ds = xr.Dataset(data_vars=data_vars, coords=coords)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case is captured by cause i am suggesting we time insertion for 0, 10, 100, 1000 existing variable. One can interpolate the bahvior.

I do believe that bencharmming all at once creation (the second case) is also interesting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it thinking that we were really timing the merge operations. This is also true for replacing, but not true for dropping.

Maybe we add dropping to a new dataset.py. This is all quite minor though! Shall we merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking that we were really timing the merge operations

It depends if you want to think about from the point of view of "xarray developer" or "xarray user".

As an "xarray user", i'm trying to "insert a totally new variable, no dims, no coords". I don't really care what is happening under the hood. This is what I originally had in mind. This is h

My goal with the original name was to test operations on in-memory xarrays (as opposed to those backed by a file, which seem to be done in dataset_io). I don't know if I have time to "add other benchmarks" at the moment. This seems good.

The name is the only thing that will survive. so if you are are ok with it.

I ran the benchmarks against main, #7221, and #7222. The benchmark seems to show the effect locally even with a single insertion. I've thus simplified the benchmark to do a single insertion, and not 5.

main

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion   ok
[ 50.00%] ··· ======== ==========
               param1
              -------- ----------
                 0      636±0μs
                 10     511±0μs
                100     5.92±0ms
                1000    41.0±0ms
              ======== ==========

#7221

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion   ok
[ 50.00%] ··· ======== ==========
               param1
              -------- ----------
                 0      609±0μs
                 10     1.32±0ms
                100     4.01±0ms
                1000    8.65±0ms
              ======== ==========

#7222

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion   ok
[ 50.00%] ··· ======== ==========
               param1
              -------- ----------
                 0      578±0μs
                 10     1.03±0ms
                100     2.74±0ms
                1000    5.37±0ms
              ======== ==========

Copy link
Contributor Author

@hmaarrfk hmaarrfk Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way rename param1? ok i renamed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ asv run -E existing --quick --bench merge.DatasetAddVariable.time_variable_insertion
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python
[ 50.00%] ··· ...e.DatasetAddVariable.time_variable_insertion                 ok
[ 50.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           668±0μs
                       10          1.17±0ms
                      100          2.22±0ms
                      1000         5.38±0ms
              =================== ==========

@dcherian
Copy link
Contributor

Thanks @hmaarrfk Great PR!

@dcherian dcherian merged commit c000690 into pydata:main Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants