Skip to content

Openpyxl engine for reading excel files #25092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 87 commits into from
Jun 28, 2019
Merged
Changes from 1 commit
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
e29b4c0
prepare testing reading excel files with multiple engines
tdamsma Feb 2, 2019
e0199a8
add openpyxl tests
tdamsma Feb 2, 2019
ce4eb01
implement first version of openpyxl reader
tdamsma Feb 2, 2019
b25877e
pep8 issues
tdamsma Feb 2, 2019
821fa4d
suppress openpyxl warnings
tdamsma Feb 2, 2019
4694668
add code for all edge cases that are tested for. Unfortunately got pr…
tdamsma Feb 7, 2019
712f1ef
formatting
tdamsma Feb 7, 2019
1d49a0e
Merge commit '683c7b55f5195fdf4f524239066cbf6f1301f0e7' into openpyxl…
tdamsma Feb 7, 2019
1473c0e
improve docstring
tdamsma Feb 7, 2019
6e8ffba
also test openpyxl reader for .xlsm files
tdamsma Feb 7, 2019
d57dfc1
explicitly use 64bit floats and ints
tdamsma Feb 7, 2019
e984f6b
Merge commit '6359bbc4c9ce6dd05bc8b422641cda74871cde43' into openpyxl…
tdamsma Feb 11, 2019
44f7af2
formatting
tdamsma Feb 11, 2019
98d3865
skip TestOpenpyxlReader when openpyxl is not installed
tdamsma Feb 11, 2019
d0188ba
Attempt to generalize _XlrdReader __init__ and move it to _BaseExcelR…
tdamsma Feb 12, 2019
205d52b
Merge commit 'f4568fd76e864d8aee3d23f5a81302262d6e0dcb' into openpyxl…
tdamsma Feb 20, 2019
7b550bf
register openpyxl writer engine, fix imports
tdamsma Feb 26, 2019
875de8d
import type_error explicitly
tdamsma Feb 26, 2019
12ad6d8
Merge branch 'master' into openpyxl-reader
tdamsma Mar 11, 2019
dfd6a36
Merge branch 'master' into openpyxl-reader
tdamsma Mar 19, 2019
fef7233
Merge branch 'master' into openpyxl-reader
tdamsma Apr 20, 2019
eaafd5f
get rid of some py2 compatibility legacy
tdamsma Apr 21, 2019
8d2db02
Merge branch 'master' into openpyxl-reader
tdamsma Apr 22, 2019
13e7793
fix some type chcking
tdamsma Apr 22, 2019
b053cce
linting
tdamsma Apr 22, 2019
fe4dd73
see if this works on linux
tdamsma Apr 22, 2019
64e5f2d
run isort on _openpyxl.py
tdamsma Apr 22, 2019
99b2cad
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
tdamsma Apr 23, 2019
ce5ac05
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
tdamsma Apr 23, 2019
c7895ea
Merge remote-tracking branch 'pandas/master' into openpyxl-reader
tdamsma Apr 27, 2019
2ca9368
refactor handling of sheet_name keyword
tdamsma Apr 27, 2019
5fb1aef
extract code to parse a single sheet to a method
tdamsma Apr 27, 2019
537dd0c
extract handling of header keywords
tdamsma Apr 27, 2019
44cddc5
extract handling of convert_float keyword to method
tdamsma Apr 27, 2019
e4c8f23
extract handling of index_col to method
tdamsma Apr 27, 2019
daff364
extract handling of usecols keyword to method
tdamsma Apr 27, 2019
1224918
remove redundant code
tdamsma Apr 27, 2019
1bfc030
Merge remote-tracking branch 'upstream/master' into excel-read-shared…
tdamsma Apr 28, 2019
747311e
Merge branch 'master' into excel-read-shared-init-to-baseclass
tdamsma Apr 28, 2019
a77a4c7
implement suggestions @WillAyd
tdamsma Apr 29, 2019
ddcaad8
Merge remote-tracking branch 'upstream/master' into excel-read-shared…
tdamsma Apr 29, 2019
757235d
Merge branch 'excel-read-shared-init-to-baseclass' into openpyxl-reader
tdamsma Apr 29, 2019
cdd627f
remove _engine keyword altogether
tdamsma Apr 29, 2019
0b58109
Merge branch 'excel-read-shared-init-to-baseclass' into openpyxl-reader
tdamsma Apr 29, 2019
45f21f8
Clean up __init__
tdamsma Apr 29, 2019
e97d029
Implement work around for Linux py35_compat import error
tdamsma Apr 29, 2019
1edae5e
fix regression for reading s3 files
tdamsma Apr 30, 2019
a69e104
Merge branch 'excel-read-shared-init-to-baseclass' into openpyxl-reader
tdamsma Apr 30, 2019
f5f40e4
expand code highlighting the weirdness of a failing/skipped test.
tdamsma Apr 30, 2019
22e24bb
remove _engine keyword altogether
tdamsma Apr 29, 2019
903b188
fix regression for reading s3 files
tdamsma Apr 30, 2019
1b3ae99
Merge branch 'excel-read-shared-init-to-baseclass' into openpyxl-reader
tdamsma Apr 30, 2019
02e19a8
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
tdamsma Apr 30, 2019
3e18f97
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
tdamsma Apr 30, 2019
d11956c
remove accidental commit
tdamsma May 1, 2019
61d7a3f
ditch some code
tdamsma May 1, 2019
13d41b2
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
tdamsma Jun 10, 2019
97c85f5
remove skips for openpyxl for tests that should pass
tdamsma Jun 11, 2019
614d972
Add `by_blocks=True` to failing `assert_frame_equal` tests, as per @W…
tdamsma Jun 13, 2019
d87d9c0
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
WillAyd Jun 27, 2019
7348b0c
Updated import machinery
WillAyd Jun 27, 2019
c1a1792
Cleaned up nan replacement
WillAyd Jun 27, 2019
d72ca5a
Simplified introspection
WillAyd Jun 27, 2019
0bba345
Used common renaming method
WillAyd Jun 27, 2019
8dd8bf6
Reverted some test changes
WillAyd Jun 27, 2019
eaaa680
Reset yield statement
WillAyd Jun 27, 2019
6bf5183
Better missing label handling
WillAyd Jun 27, 2019
a06bf9b
Aligned implementation with base
WillAyd Jun 27, 2019
f43e90f
Fix bool handling
WillAyd Jun 27, 2019
8fabe0a
Fixed 0 handling
WillAyd Jun 27, 2019
0ff5ce3
Aligned float handling with xlrd
WillAyd Jun 27, 2019
fb73692
xfailed overflow test
WillAyd Jun 27, 2019
17b1d73
lint and isort fixup
WillAyd Jun 27, 2019
3d248ed
Removed by_blocks
WillAyd Jun 27, 2019
c369fd8
Revert "Reverted some test changes"
tdamsma Jun 28, 2019
70b15a4
use readonly mode. Should be more performant and also this ignores Me…
tdamsma Jun 28, 2019
a3a3bca
formatting issues
tdamsma Jun 28, 2019
fcd43f0
handle datetime cells explicitly for openpyxl < 2.5.0 compatibility
tdamsma Jun 28, 2019
d9c1fa6
type fixup
WillAyd Jun 28, 2019
3c239a4
whatsnew
WillAyd Jun 28, 2019
4a25a5a
Removed np.nan from Scalar
WillAyd Jun 28, 2019
6258e59
revert test_reader changes again. Not needed anymore because of using…
tdamsma Jun 28, 2019
00f34b1
more types and whitespace cleanup
WillAyd Jun 28, 2019
a1fba90
Added config for excel reader. Not sure how to test this
tdamsma Jun 28, 2019
88ee325
whatsnew
WillAyd Jun 28, 2019
837ce26
Merge remote-tracking branch 'upstream/master' into openpyxl-reader
WillAyd Jun 28, 2019
dddc8c5
Regenerated test1 files
WillAyd Jun 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Reverted some test changes
  • Loading branch information
WillAyd committed Jun 27, 2019
commit 8dd8bf64a19588a507dd314f90cbb797aeaaf812
50 changes: 10 additions & 40 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,27 +19,16 @@


@contextlib.contextmanager
def ignore_engine_warnings():
def ignore_xlrd_time_clock_warning():
"""
Context manager to ignore warnings raised by the excel engine that would
interfere with asserting warnings are reaised.
Context manager to ignore warnings raised by the xlrd library,
regarding the deprecation of `time.clock` in Python 3.7.
"""
with warnings.catch_warnings():
# raised by the xlrd library, regarding the deprecation of `time.clock`
# in Python 3.7.
warnings.filterwarnings(
action='ignore',
message='time.clock has been deprecated',
category=DeprecationWarning)

# raised by the openpyxl library, if unsupported extensions to the
# xlsx specification are used in .xslx file. E.g. conditional
# formatting, conditional formatting etc. See also
# https://stackoverflow.com/questions/34322231/python-2-7-openpyxl-userwarning
warnings.filterwarnings(
action='ignore',
message='Unknown extension is not supported and will be removed',
category=UserWarning)
yield


Expand Down Expand Up @@ -70,14 +59,14 @@ def test_usecols_int(self, read_ext, df_ref):
# usecols as int
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
with ignore_engine_warnings():
with ignore_xlrd_time_clock_warning():
df1 = pd.read_excel("test1" + read_ext, "Sheet1",
index_col=0, usecols=3)

# usecols as int
with tm.assert_produces_warning(FutureWarning,
check_stacklevel=False):
with ignore_engine_warnings():
with ignore_xlrd_time_clock_warning():
df2 = pd.read_excel("test1" + read_ext, "Sheet2", skiprows=[1],
index_col=0, usecols=3)

Expand Down Expand Up @@ -304,11 +293,6 @@ def test_reader_converters(self, read_ext):
actual = pd.read_excel(
basename + read_ext, 'Sheet1', converters=converters)

if pd.read_excel.keywords['engine'] == 'openpyxl':
pytest.skip(
"There doesn't seem to be a sensible way to support this for "
"openpyxl")

tm.assert_frame_equal(actual, expected)

def test_reader_dtype(self, read_ext):
Expand Down Expand Up @@ -363,11 +347,6 @@ def test_reader_dtype_str(self, read_ext, dtype, expected):
basename = "testdtype"

actual = pd.read_excel(basename + read_ext, dtype=dtype)

if pd.read_excel.keywords['engine'] == 'openpyxl':
pytest.skip(
"There doesn't seem to be a sensible way to support this for "
"openpyxl")
tm.assert_frame_equal(actual, expected)

def test_reading_all_sheets(self, read_ext):
Expand Down Expand Up @@ -423,21 +402,16 @@ def test_date_conversion_overflow(self, read_ext):
[1e+20, 'Timothy Brown']],
columns=['DateColWithBigInt', 'StringCol'])

if pd.read_excel.keywords['engine'] == 'openpyxl':
with pytest.raises(OverflowError):
# openpyxl does not support reading invalid dates
result = pd.read_excel('testdateoverflow' + read_ext)
else:
result = pd.read_excel('testdateoverflow' + read_ext)
tm.assert_frame_equal(result, expected)
result = pd.read_excel('testdateoverflow' + read_ext)
tm.assert_frame_equal(result, expected)

def test_sheet_name(self, read_ext, df_ref):
filename = "test1"
sheet_name = "Sheet1"

df1 = pd.read_excel(filename + read_ext,
sheet_name=sheet_name, index_col=0) # doc
with ignore_engine_warnings():
with ignore_xlrd_time_clock_warning():
df2 = pd.read_excel(filename + read_ext, index_col=0,
sheet_name=sheet_name)

Expand All @@ -464,9 +438,7 @@ def test_read_from_http_url(self, read_ext):
url_table = pd.read_excel(url)
local_table = pd.read_excel('test1' + read_ext)

# TODO: remove the by_blocks=True, investigate why this
# causes this test to fail
tm.assert_frame_equal(url_table, local_table, by_blocks=True)
tm.assert_frame_equal(url_table, local_table)

@td.skip_if_not_us_locale
def test_read_from_s3_url(self, read_ext, s3_resource):
Expand All @@ -479,9 +451,7 @@ def test_read_from_s3_url(self, read_ext, s3_resource):
url_table = pd.read_excel(url)
local_table = pd.read_excel('test1' + read_ext)

# TODO: remove the by_blocks=True, investigate why this
# causes this test to fail
tm.assert_frame_equal(url_table, local_table, by_blocks=True)
tm.assert_frame_equal(url_table, local_table)

@pytest.mark.slow
# ignore warning from old xlrd
Expand Down