Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make read_excel dtype parameter compatible with multiple sheets (e.g sheet_name =[0,"list_of_sheets"]) #55030

Open
1 of 3 tasks
struck89 opened this issue Sep 6, 2023 · 1 comment
Labels
Enhancement IO Excel read_excel, to_excel Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@struck89
Copy link

struck89 commented Sep 6, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I would like the dtype parameter to be compatible with opening multiple sheets.

Feature Description

when a list of sheets is passed, dtype could be a dictionary with {elem_on_list: usual_dtype_parameter}
so if
sheet_name = [0,'Sheet3']
dtype could be
dtype={0:{'colA':str, 'colB': 'float64'}, 'Sheet3':{'colA':'datetime64[ns]', 'colB': str}

Alternative Solutions

Iterate yourself and create the dictionary yourself with

dfs = {}
dtypes={0:{'colA':str, 'colB': 'float64'}, 'Sheet3':{'colA':'datetime64[ns]', 'colB': str}
for sheet_name in [0,'Sheet3']:
   dfs[sheet_name] = pd.read_excel('file.xlsx',sheet_name=sheet_name,dtype=dtypes[sheet_name])

Additional Context

In my case, pandas automatic decision of what the object is, doesn't work well in my scenarios. It always assumes numbers, even if the column has explicitly been marked as "text" in Excel, so I need to specify dtypes.

This would make my life easier, and would make sense that the multiple sheet name functionality is compatible with also giving multiple dtype arguments per sheet. I haven't tested if just adding all the columns at the dictionary would work, regardless of the Sheet, but it's very obvious that one can have the same column name in two sheets, and that one might want different datatypes there. And given that I plan to use something like this for automation purposes, I can't just assume same column name will always be the same type.

@struck89 struck89 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 6, 2023
@jbrockmendel jbrockmendel added the IO Excel read_excel, to_excel label Nov 1, 2023
@rhshadrach
Copy link
Member

Thanks for the report!

In my case, pandas automatic decision of what the object is, doesn't work well in my scenarios. It always assumes numbers, even if the column has explicitly been marked as "text" in Excel, so I need to specify dtypes.

Can you give a reproducible example where this happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants