Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
When reading Excel files, pandas ignores Excel's "Text" cell formatting and converts text-formatted numbers (e.g., IDs, codes) to numeric types (int/float).
This requires manual conversion back to strings, which can be inefficient , for a huge dataset and prone to errors.
Feature Description
Add an option in pd.read_excel() to respect Excel's cell formatting
(e.g., dtype_from_format=True), or set it to true by default , preserving text-formatted columns as strings.
Alternative Solutions
OpenPyXL/Xlrd Engine + Format Detection
Read cell formats directly (requires manual parsing):
from openpyxl import load_workbook
wb = load_workbook("data.xlsx", data_only=False)
sheet = wb.active
text_columns = [col for col in sheet.columns if sheet.cell(row=1, column=col[0].column).number_format == "@"]
Additional Context
No response