Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Simplifying indexing (DataFrame.__getitem__) #22

Open
@shoyer

Description

@shoyer

The rules for exactly what DataFrame.__getitem__/__setitem__ does (pandas-dev/pandas#9595) are sufficiently complex and inconsistent that they are impossible to understand without extensive experimentation.

This makes for a rather embarrassing situation that we really should fix for pandas 2.0.

I made a proposal when this came up last year:

  • Indexing with a string or list of strings does label based selection on columns.
  • All other indexing is position based, NumPy style. (This includes indexing with a boolean array.)

I still like my proposal, but more importantly, it satisfies two important criteria:

  1. The most common uses of DataFrame indexing work unchanged (df['foo'], df[['foo', 'bar']], and df[df['foo'] == 'bar'] might cover 80% of use cases).
  2. It's short and simple, with no exceptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions