Description
Originaly from: pandas-dev/pandas#61304 (comment)
Problem Description
Pandas is widely used in data-heavy workflows, and in many cases, the structure of a DataFrame is known in advance — especially when loading from sources like CSVs, databases, or APIs.
However, pandas DataFrames are fully dynamic, so IDEs and static type checkers cannot infer the structure. This limits productivity, especially in large codebases, because Column names don’t autocomplete
We’re not asking for runtime schema enforcement or data validation — we’re already familiar with Pandera and similar tools. What’s missing is a mechanism for IDEs and static tools (like Pylance and MyPy) to recognize DataFrame schemas for better code intelligence.
Feature Description
Introduce an optional way to define column names and types for a DataFrame that tools like VS Code + Pylance can use for autocompletion and type hints.
Example syntax (suggested API):
import pandas as pd
from pandas.typing import Schema # hypothetical
class OrderSchema(Schema):
OrderID: int
CustomerName: str
OrderDate: str
Product: str
Quantity: int
Price: float
Country: str
df: pd.DataFrame[OrderSchema] = pd.read_csv("orders.csv")
# IDE should support:
df.Country # autocomplete & type: str
This would behave similarly to how TypedDict or Pydantic models enable structure-aware development, but focused on DataFrame-level constructs.
It does not need to affect runtime at all — just serve as a static hint for tooling.