You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When invoking open_mfdataset, very frequently the user knows in advance that all of his coords that aren't
on the concat_dim are already aligned, and may be willing to blindly trust such assumption in exchange of a huge performance boost.
My production data: 200x NetCDF files on a not very performant NFS file system, concatenated on the "scenario" dimension:
Add a new optional parameter to open_mfdataset, assume_aligned=None.
It can be valued to a list of variable names or "all", and requires concat_dim to be explicitly set.
It causes open_mfdataset to use the first occurrence of every variable and blindly skip loading the subsequent ones.
Algorithm
Perform the first invocation to the underlying open_dataset like it happens now
if assume_aligned is not None: for each new NetCDF file, figure out which variables need to be aligned & compared (as opposed to concatenated), and add them to a drop_variables list.
if assume_aligned != "all": drop_variables &= assume_aligned
Pass the increasingly long drop_variables list to the underlying open_dataset
The text was updated successfully, but these errors were encountered:
There has already been lots discussion of this on #1385 and #1823. I tried and failed to implement something similar in #1413. I recommend reviewing those threads before jumping in to this.
dcherian
changed the title
open_mfdataset to blindly trust alignment
open_mfdataset: skip loading for indexes and coordinates from all but the first file
Sep 16, 2019
This is a follow-up from #1521.
When invoking open_mfdataset, very frequently the user knows in advance that all of his coords that aren't
on the concat_dim are already aligned, and may be willing to blindly trust such assumption in exchange of a huge performance boost.
My production data: 200x NetCDF files on a not very performant NFS file system, concatenated on the "scenario" dimension:
If I skip loading and comparing the non-index coords from all 200 files:
If I skip loading and comparing also the index coords from all 200 files:
Proposed design
Add a new optional parameter to open_mfdataset,
assume_aligned=None
.It can be valued to a list of variable names or "all", and requires
concat_dim
to be explicitly set.It causes open_mfdataset to use the first occurrence of every variable and blindly skip loading the subsequent ones.
Algorithm
The text was updated successfully, but these errors were encountered: