You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.
Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of .loc that raises KeyError in such cases or by changing the behavior of .loc altogether.
In xarray, .loc only works with pre-existing index labels for getitem df.loc[key] and assignment df.loc[key] = values (inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to .reindex.
Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:
Significantly simpler indexing code. All the logic for mapping indexers to positions in xarray fits in about 50 lines of code. In contrast, pandas/core/indexing.py is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.
It's harder to write code that is entirely at odds with pandas's columnar data model. You can no longer do silly things like creating an empty DataFrame and filling it in later, e.g.,
Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of
.loc
that raisesKeyError
in such cases or by changing the behavior of.loc
altogether.In xarray,
.loc
only works with pre-existing index labels for getitemdf.loc[key]
and assignmentdf.loc[key] = values
(inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to.reindex
.Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:
pandas/core/indexing.py
is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.In my view this is a positive, but it would certainly be a big backwards compatibility break.
The text was updated successfully, but these errors were encountered: