Skip to content

read_excel function fail when header is set to None #4924

@toan-quach

Description

@toan-quach

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS
  • Modin version (modin.__version__): 0.5.12
  • Python version: 3.8.13, 3.9.12, 3.10.6
  • Code we can use to reproduce:

Describe the problem

read_excel function works fine when I didn't include the parameter header and set it to None with header=None.
I have tested it out with an excel that contains a column of only numbers and another excel file with a column that has the 1st row as string and the rest as numbers (the 1st row should also be considered a normal row along with the rest)

Source code / logs

Source code:
Case 1

import modin.pandas as pd

data = pd.read_excel('example.xlsx', header=None)
data

Case 2

import modin.pandas as pd

data = pd.read_excel('example_.xlsx', header=None)
data

Excel file to reproduce:
example.xlsx
example_2.xlsx

Log:
Traceback (most recent call last):
File "", line 1, in
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/logging/logger_metaclass.py", line 68, in log_wrap
return method(*args, **kwargs)
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/pandas/dataframe.py", line 215, in repr
result = repr(self._build_repr_df(num_rows, num_cols))
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/logging/logger_metaclass.py", line 68, in log_wrap
return method(*args, **kwargs)
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/pandas/base.py", line 203, in _build_repr_df
return self.iloc[indexer]._query_compiler.to_pandas()
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/logging/logger_metaclass.py", line 68, in log_wrap
return method(*args, **kwargs)
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/core/storage_formats/pandas/query_compiler.py", line 259, in to_pandas
return self._modin_frame.to_pandas()
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/logging/logger_metaclass.py", line 68, in log_wrap
return method(*args, **kwargs)
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 115, in run_f_on_minimally_updated_metadata
result = f(self, *args, **kwargs)
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 2840, in to_pandas
ErrorMessage.catch_bugs_and_request_email(
File "/Users/shiro/.local/share/virtualenvs/taipy-core-fdyg53sb/lib/python3.9/site-packages/modin/error_message.py", line 70, in catch_bugs_and_request_email
raise Exception(
Exception: Internal Error. Please visit https://github.com/modin-project/modin/issues to file an issue with the traceback and the command that caused this error. If you can't file a GitHub issue, please email bug_reports@modin.org.
Internal and external indices on axis 1 do not match.

data = pd.read_excel('tests/data_sample/example.xlsx', header=None)
UserWarning: Parallel read_excel is a new feature! If you run into any problems, please visit https://github.com/modin-project/modin/issues. If you find a new issue and can't file it on GitHub, please email bug_reports@modin.org.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Important tasks that we should complete soonbug 🦗Something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions