Skip to content

opening datasets slow compared to netCDF4 #9058

Answered by dcherian
harzer99 asked this question in Q&A
Discussion options

You must be logged in to vote

Apparently netCDF4._netCDF4.Variable.shape is quite slow:

%time ds = nc.Dataset("/Users/deepak/Downloads/bioscen15-sdm-gam_ewembi_nobc_hist_nosoc_co2_birdprob_global_30year-mean_1995_1995.nc4")
%time ds.variables["Abeillia_abeillei"].shape
CPU times: user 915 ms, sys: 44.4 ms, total: 960 ms
Wall time: 963 ms
CPU times: user 17 ms, sys: 3.11 ms, total: 20.1 ms
Wall time: 20.1 ms

20ms * 8500 vars = 170 seconds. This gets run twice so at least 340 seconds :) . But we can avoid that quite easily: 50% speedup here: https://github.com/pydata/xarray/pull/9067/files

Here's a profile for open_store_variable, I made a small edit to time NetCDF4ArrayWrapper separately.

Line #      Hits         Ti…

Replies: 4 comments 11 replies

Comment options

You must be logged in to vote
2 replies
@trexfeathers
Comment options

@harzer99
Comment options

Comment options

You must be logged in to vote
1 reply
@harzer99
Comment options

Comment options

You must be logged in to vote
6 replies
@harzer99
Comment options

@dcherian
Comment options

@kmuehlbauer
Comment options

@dcherian
Comment options

@kmuehlbauer
Comment options

Comment options

You must be logged in to vote
2 replies
@harzer99
Comment options

@dcherian
Comment options

Answer selected by dcherian
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants