-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset insertion benchmark #7223
Conversation
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
for more information, see https://pre-commit.ci
ok naming is always hard. I tried to pick a good name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
import xarray as xr | ||
|
||
|
||
class DatasetAddVariable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, i was going to expand this to other in-memory operations. Replace a variable? Drop a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the first version used was relevant as well:
ds = xr.Dataset(coords=coords)
for v in names:
ds[v] = ("time", value)
And this fast one from #7224 (comment) would also be nice to benchmark:
data_vars = {v: ("time", value) for v in names}
ds = xr.Dataset(data_vars=data_vars, coords=coords)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first case is captured by cause i am suggesting we time insertion for 0, 10, 100, 1000 existing variable. One can interpolate the bahvior.
I do believe that bencharmming all at once creation (the second case) is also interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it thinking that we were really timing the merge
operations. This is also true for replacing, but not true for dropping.
Maybe we add dropping to a new dataset.py
. This is all quite minor though! Shall we merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking that we were really timing the merge operations
It depends if you want to think about from the point of view of "xarray developer" or "xarray user".
As an "xarray user", i'm trying to "insert a totally new variable, no dims, no coords". I don't really care what is happening under the hood. This is what I originally had in mind. This is h
My goal with the original name was to test operations on in-memory xarrays (as opposed to those backed by a file, which seem to be done in dataset_io
). I don't know if I have time to "add other benchmarks" at the moment. This seems good.
The name is the only thing that will survive. so if you are are ok with it.
I ran the benchmarks against main, #7221, and #7222. The benchmark seems to show the effect locally even with a single insertion. I've thus simplified the benchmark to do a single insertion, and not 5.
main
[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok
[ 50.00%] ··· ======== ==========
param1
-------- ----------
0 636±0μs
10 511±0μs
100 5.92±0ms
1000 41.0±0ms
======== ==========
[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok
[ 50.00%] ··· ======== ==========
param1
-------- ----------
0 609±0μs
10 1.32±0ms
100 4.01±0ms
1000 8.65±0ms
======== ==========
[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion ok
[ 50.00%] ··· ======== ==========
param1
-------- ----------
0 578±0μs
10 1.03±0ms
100 2.74±0ms
1000 5.37±0ms
======== ==========
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way rename ok i renamed it.param1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ asv run -E existing --quick --bench merge.DatasetAddVariable.time_variable_insertion
· Discovering benchmarks
· Running 1 total benchmarks (1 commits * 1 environments * 1 benchmarks)
[ 0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python
[ 50.00%] ··· ...e.DatasetAddVariable.time_variable_insertion ok
[ 50.00%] ··· =================== ==========
existing_elements
------------------- ----------
0 668±0μs
10 1.17±0ms
100 2.22±0ms
1000 5.38±0ms
=================== ==========
Thanks @hmaarrfk Great PR! |
xref: #7221
whats-new.rst
api.rst