Skip to content

ENH: Add Optional Schema Definitions to Enable IDE Autocompletion #1190

Open
@YoniChechik

Description

@YoniChechik

Originaly from: pandas-dev/pandas#61304 (comment)

Problem Description

Pandas is widely used in data-heavy workflows, and in many cases, the structure of a DataFrame is known in advance — especially when loading from sources like CSVs, databases, or APIs.

However, pandas DataFrames are fully dynamic, so IDEs and static type checkers cannot infer the structure. This limits productivity, especially in large codebases, because Column names don’t autocomplete

We’re not asking for runtime schema enforcement or data validation — we’re already familiar with Pandera and similar tools. What’s missing is a mechanism for IDEs and static tools (like Pylance and MyPy) to recognize DataFrame schemas for better code intelligence.

Feature Description

Introduce an optional way to define column names and types for a DataFrame that tools like VS Code + Pylance can use for autocompletion and type hints.

Example syntax (suggested API):

import pandas as pd
from pandas.typing import Schema  # hypothetical

class OrderSchema(Schema):
    OrderID: int
    CustomerName: str
    OrderDate: str
    Product: str
    Quantity: int
    Price: float
    Country: str

df: pd.DataFrame[OrderSchema] = pd.read_csv("orders.csv")

# IDE should support:
df.Country           # autocomplete & type: str

This would behave similarly to how TypedDict or Pydantic models enable structure-aware development, but focused on DataFrame-level constructs.

It does not need to affect runtime at all — just serve as a static hint for tooling.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions