.loc without reindexing

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of `.loc` that raises `KeyError` in such cases or by changing the behavior of `.loc` altogether.

In xarray, `.loc` only works with pre-existing index labels for getitem `df.loc[key]` and assignment `df.loc[key] = values` (inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to `.reindex`.

Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:
1. Significantly simpler indexing code. All the logic for mapping indexers to positions in xarray fits in about [50 lines of code](https://github.com/pydata/xarray/blob/f00ab9b8f55cd630a2dca905ad07ac1a8093a256/xarray/core/indexing.py#L144). In contrast, `pandas/core/indexing.py` is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.
2. It's harder to write code that is entirely at odds with pandas's columnar data model. You can no longer do silly things like creating an empty DataFrame and filling it in later, e.g.,

``` python
df = pd.DataFrame()
for row, col, value in data:
   df.loc[row, col] = value
```

In my view this is a positive, but it would certainly be a big backwards compatibility break.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.loc without reindexing #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

.loc without reindexing #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions