-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter defaults for concat, combine, open_mfdataset #8778
Comments
Thanks for raising this @dcherian. I support this, along with some effort to improve the docstrings/documentation/faq to make the default behaviour and potential pitfalls clearer. What would a deprecation cycle for this look like? We're talking about changing the default values of 4 different keyword arguments, that get used in |
+1. it would be massively confusing otherwise. |
Regarding deprecation cycle. I think something like:
|
Broadly agree with @jsignell, one dissent:
...iff the behavior from xarray would be different. Otherwise this will be really noisy. Requires some work to assess whether the behavior will change in the |
Is your feature request related to a problem?
The defaults for
concat
are excessively permissive:data_vars="all", coords="different", compat="no_conflicts", join="outer"
. This comment illustrates why this can be hard to predict or understand: a seemingly unrelated optiondecode_cf
controls whether a variable is indata_vars
orcoords
, and can result in wildly different concatenation behaviour.concat_dim
even if they did not have that dimension to begin with.concat_dim
) can result in very large datasets due to small floating points differences in the indexes, and also questionable behaviour with staggered grid datasets.duck_array_ops.nanfirst
here I think).While "convenient" this really just makes the default experience quite bad with hard-to-understand slowdowns.
Describe the solution you'd like
I propose we migrate to
data_vars="minimal", coords="minimal", join="exact", compat="override"
. This shoulddata_vars
andcoords
variables when they already haveconcat_dim
.concat_dim
, it will blindly pick them from the first file.join="exact"
will prevent ballooning of dimension sizes due to floating point inequalities.Unfortunately, this has a pretty big blast radius so we'd need a long deprecation cycle.
Describe alternatives you've considered
No response
Additional context
xref #4824
xref #1385
xref #8231
xref #5381
xref #2064
xref #2217
The text was updated successfully, but these errors were encountered: