Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

.loc without reindexing #21

Open
Open
@shoyer

Description

@shoyer

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of .loc that raises KeyError in such cases or by changing the behavior of .loc altogether.

In xarray, .loc only works with pre-existing index labels for getitem df.loc[key] and assignment df.loc[key] = values (inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to .reindex.

Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:

  1. Significantly simpler indexing code. All the logic for mapping indexers to positions in xarray fits in about 50 lines of code. In contrast, pandas/core/indexing.py is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.
  2. It's harder to write code that is entirely at odds with pandas's columnar data model. You can no longer do silly things like creating an empty DataFrame and filling it in later, e.g.,
df = pd.DataFrame()
for row, col, value in data:
   df.loc[row, col] = value

In my view this is a positive, but it would certainly be a big backwards compatibility break.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions