Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

Optional indexes #17

Open
Open
@shoyer

Description

@shoyer

The pandas.Index is fantastically useful, but in many cases pandas's insistence on always having an index gets in the way.

Usually, it can be safely ignored when not relevant, especially now that we have RangeIndex (which makes the cost of creating the index minimal), but this is not always the case:

  1. The indexing and join behavior of default RangeIndex is actively harmful. It would be better to raise an error when implicitly joining on an index between two datasets with a default index.
  2. When converting a DataFrame into other formats, we need an argument (e.g., index=True) for controlling whether or not to include the index.

I propose that we make the index optional, e.g., by allowing it to be set to None. This entails a need for some rules to handle missing indexes:

  • Operations that explicitly rely on indexes (e.g., .loc and join) should raise TypeError when called on objects without an index.
  • Operations that implicitly rely on indexes for alignment (e.g., the DataFrame constructor and arithmetic) now need to handle three cases:
    1. Index/index operations: These work as before. The result's index has an outer join of the input indexes
    2. No-index/no-index operations: The inputs have the exact same length (or raise TypeError). The result has no index.
    3. Mixed index/no-index operations: The inputs must have the same length. The result takes on the index from the input with an index.

Somewhat related: #15

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions