-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset constructor always coerces 1D data variables with same name as dim to coordinates #8959
Comments
I think that comes explicitly from netCDF User's Guide and the CF Conventions("coordinate varible") Though reading now, I guess it's saying all coordinate variables must be 1D with the same name as the dim, not necessarily the converse. |
Auto-promoting dimension data variables as dimension coordinates when creating a new Dataset has been indeed the expected behavior so far. I'm not sure what best we should do, though. On one hand, |
Thanks for that context @dopplershift ! The way I see it there are two consistent behaviours:
Yeah this would break quite a lot of user code... |
Is it possible to imagine a deprecation cycle for this? One which detects this input pattern and raises a warning telling you to use |
what we could do is split the behavior between keyword-argument def __init__(self, vars=None, /, coords=None, attrs=None, *, data_vars=None):
... where Then, at some point, we deprecate the positional argument or all positionals (not sure how that part should look like exactly, though). |
That looks like a nice solution @keewis, except maybe |
true. But I guess people passing |
We decided to start by raising |
I like this idea. It makes the current behavior more explicit and makes it clear how to opt-in to the new behavior. |
#8979 implements this suggestion. The same |
What is your issue?
Whilst xarray's data model appears to allow 1D data variables that have the same name as their dimension, it seems to be impossible to actually create this using the
Dataset
constructor, as they will always be converted to coordinate variables instead.We can create a 1D data variable with the same name as it's dimension like this:
so it seems to be a valid part of the data model.
But I can't get to that situation from the
Dataset
constructor. This should create the same dataset:But actually it makes
x
a coordinate variable (and implicitly creates a pandas Index for it). This means that in this case there is no difference between using thedata_vars
andcoords
kwargs to the constructor:This all seems weird to me. I would have thought that if a 1D data variable is allowed, we shouldn't coerce to making it a coordinate variable in the constructor. If anything that's actively misleading.
Note that whilst this came up in the context of trying to avoid auto-creation of 1D indexes for coordinate variables, this issue is actually separate. (xref #8872 (comment))
cc @benbovy who probably has thoughts
The text was updated successfully, but these errors were encountered: