Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

.loc without reindexing #21

Open
shoyer opened this issue Sep 9, 2016 · 0 comments
Open

.loc without reindexing #21

shoyer opened this issue Sep 9, 2016 · 0 comments
Labels

Comments

@shoyer
Copy link

shoyer commented Sep 9, 2016

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of .loc that raises KeyError in such cases or by changing the behavior of .loc altogether.

In xarray, .loc only works with pre-existing index labels for getitem df.loc[key] and assignment df.loc[key] = values (inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to .reindex.

Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:

  1. Significantly simpler indexing code. All the logic for mapping indexers to positions in xarray fits in about 50 lines of code. In contrast, pandas/core/indexing.py is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.
  2. It's harder to write code that is entirely at odds with pandas's columnar data model. You can no longer do silly things like creating an empty DataFrame and filling it in later, e.g.,
df = pd.DataFrame()
for row, col, value in data:
   df.loc[row, col] = value

In my view this is a positive, but it would certainly be a big backwards compatibility break.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants