Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding support for calamine as Excel reader engine #50395

Closed
1 of 3 tasks
abielr opened this issue Dec 22, 2022 · 7 comments · Fixed by #54998
Closed
1 of 3 tasks

ENH: Adding support for calamine as Excel reader engine #50395

abielr opened this issue Dec 22, 2022 · 7 comments · Fixed by #54998
Labels
Dependencies Required and optional dependencies Enhancement IO Excel read_excel, to_excel

Comments

@abielr
Copy link

abielr commented Dec 22, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Reading Excel files in Pandas is considerably slower than in some alternative data frame tools, for example the readxl package in R can read Excel files much faster. The Rust calamine library can read Excel files much faster than other engines supported by Pandas, and there is an existing Python binding to it, python-calamine. I would like to request that Pandas add official support for calamine, so that users can read an Excel file like:

pd.read_excel("test.xlsx", engine="calamine")

Feature Description

The python-calamine package already implements code that enables the calamine engine in Pandas, see the examples using pandas_monkeypatch() at the bottom of their Github README. The code to enable this is here

Although python-calamine already implements the necessary features to use the library with Pandas, I am unclear on how similar the behavior is between calamine and other engines that Pandas supports like openpyxl. I am hoping that by bringing calamine in as an officially supported engine that Pandas unit tests will confirm consistent behavior across calamine and other engines.

Alternative Solutions

None

Additional Context

No response

@abielr abielr added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 22, 2022
@lithomas1 lithomas1 added IO Excel read_excel, to_excel and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 22, 2022
@lithomas1
Copy link
Member

This looks pretty cool, especially since it seems to be able to read all sorts of excel formats(we might be able to kill off the other engines, like odfpy and pyxlsb).

I think a few weeks back, I profiled the Excel code, and most of the time seemed to be spent in openpyxl, as opposed to the parsing, so this'll definetely be an improvement.

PRs are welcome.

@kostyafarber
Copy link
Contributor

I'd be happy to explore implementing this. Just to be sure are we happy to include introduce another dependency for this engine?

@lithomas1
Copy link
Member

Yes, ideally we'd be able to deprecate of some of the other engines(e.g. pyxlsb, odfpy, xlrd), since calamine seems to support a lot more formats.

@kostyafarber
Copy link
Contributor

Okay cool I'll have a look at 👍🏻

@gfyoung gfyoung added the Dependencies Required and optional dependencies label Jan 7, 2023
dimastbk pushed a commit to dimastbk/pandas that referenced this issue Sep 4, 2023
dimastbk pushed a commit to dimastbk/pandas that referenced this issue Sep 4, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 5, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 6, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 6, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 6, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 7, 2023
dimastbk added a commit to dimastbk/pandas that referenced this issue Sep 9, 2023
@wanghaisheng
Copy link

pandas 2.2.0
and install

pip install python-calamine

why ValueError: Unknown engine: calamine

@sztal
Copy link

sztal commented Feb 14, 2024

Are there any plans for adding calamine as an engine to ExcelWriter?

@davidsteinar
Copy link

sztal Calamine is a read-only library, see here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies Enhancement IO Excel read_excel, to_excel
Projects
None yet
7 participants