-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Default to_* methods to compression='infer' #22011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
gfyoung
merged 40 commits into
pandas-dev:master
from
dhimmel:default-to-infer-compression
Aug 1, 2018
Merged
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
8689167
Default to_csv & to_json to compression='infer'
dhimmel 3ccfb00
to_json compression=infer in pandas/core/generic.py
dhimmel 648bf4d
Simplify CSVFormatter.save
dhimmel be724fa
Exploratory commit of what CSVFormatter.save should look like
dhimmel 9fe27c9
fixup! Simplify CSVFormatter.save
dhimmel 65f0689
"Revert changes not related to compression default
dhimmel 868e671
TST: test to_csv infers compression by default
dhimmel c3b76ee
Debugging print statements
dhimmel cebc0d9
Debugging: use logging rather than print
dhimmel 8411eb2
_infer_compression in CSVFormatter
dhimmel c098c8f
CSVFormatter: process encoding in init for consistency
dhimmel 2f6601d
TST + DOC: test_compression_warning docstring
dhimmel eb7f9b5
fixup! CSVFormatter: process encoding in init for consistency
dhimmel d4a5c90
Tests passing: remove debugging
dhimmel abd19e3
Parametrized test for compression='infer' is default
dhimmel 2f670fe
Default compression='infer' in series.to_csv
dhimmel aa9ce13
What's New Entry for v0.24.0
dhimmel a6aabad
Remove unused tmpdir fixture argument
dhimmel 8a0c97e
Update to_json docstring
dhimmel 6be808d
Change test docstrings to comments
dhimmel 63e6591
Consolidate testing to a single parametrized test
dhimmel fadb943
Split test_compression_defaults_to_infer into Series & DataFrame tests
dhimmel 0edffc7
Parametrize write_kwargs
dhimmel 97f5de5
Fix kwargs in test_series_compression_defaults_to_infer
dhimmel 83bc0a8
Attempt to fix CSV series roundtrip
dhimmel 874a4bf
Fix test failure
dhimmel 14c3945
Python 2 flake8 error
dhimmel 9a4dc41
Reduce / remove comments
dhimmel 25bdb4c
Merge master: fix zip-docs conflict
dhimmel 1ba8f3a
DOC: versionchanged & tweaks
dhimmel 24e051e
Update doc/source/io.rst as needed
dhimmel 387d1d2
Move tests from tests/test_common.py to tests/io/test_common.py
dhimmel 12f14e2
Organize / simplify pandas/tests/test_common.py imports
dhimmel 6db23d9
Ignore flake error needed for test
dhimmel e3a0f56
fixup! Organize / simplify pandas/tests/test_common.py imports
dhimmel af8c137
change import: cmn to icom
dhimmel f8829a6
Blank lines after versionchanged
dhimmel 918c0f8
Move compression tests to new file tests/io/test_compression.py
dhimmel eadf68e
blank lines before .. versionchanged
dhimmel cf5b62e
Remove comments and space after GH
dhimmel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Move tests from tests/test_common.py to tests/io/test_common.py
- Loading branch information
commit 387d1d29f2833e236e0d8c3e3167c94614676973
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,18 +2,18 @@ | |
Tests for the pandas.io.common functionalities | ||
""" | ||
import mmap | ||
import pytest | ||
import os | ||
from os.path import isabs | ||
import pytest | ||
|
||
import pandas as pd | ||
import pandas.util.testing as tm | ||
import pandas.io.common as cmn | ||
import pandas.util._test_decorators as td | ||
|
||
from pandas.io import common | ||
from pandas.compat import is_platform_windows, StringIO, FileNotFoundError | ||
|
||
from pandas import read_csv, concat | ||
import pandas.util.testing as tm | ||
from pandas.compat import ( | ||
is_platform_windows, | ||
StringIO, | ||
FileNotFoundError, | ||
) | ||
|
||
|
||
class CustomFSPath(object): | ||
|
@@ -55,36 +55,36 @@ class TestCommonIOCapabilities(object): | |
|
||
def test_expand_user(self): | ||
filename = '~/sometest' | ||
expanded_name = common._expand_user(filename) | ||
expanded_name = cmn._expand_user(filename) | ||
|
||
assert expanded_name != filename | ||
assert isabs(expanded_name) | ||
assert os.path.isabs(expanded_name) | ||
assert os.path.expanduser(filename) == expanded_name | ||
|
||
def test_expand_user_normal_path(self): | ||
filename = '/somefolder/sometest' | ||
expanded_name = common._expand_user(filename) | ||
expanded_name = cmn._expand_user(filename) | ||
|
||
assert expanded_name == filename | ||
assert os.path.expanduser(filename) == expanded_name | ||
|
||
@td.skip_if_no('pathlib') | ||
def test_stringify_path_pathlib(self): | ||
rel_path = common._stringify_path(Path('.')) | ||
rel_path = cmn._stringify_path(Path('.')) | ||
assert rel_path == '.' | ||
redundant_path = common._stringify_path(Path('foo//bar')) | ||
redundant_path = cmn._stringify_path(Path('foo//bar')) | ||
assert redundant_path == os.path.join('foo', 'bar') | ||
|
||
@td.skip_if_no('py.path') | ||
def test_stringify_path_localpath(self): | ||
path = os.path.join('foo', 'bar') | ||
abs_path = os.path.abspath(path) | ||
lpath = LocalPath(path) | ||
assert common._stringify_path(lpath) == abs_path | ||
assert cmn._stringify_path(lpath) == abs_path | ||
|
||
def test_stringify_path_fspath(self): | ||
p = CustomFSPath('foo/bar.csv') | ||
result = common._stringify_path(p) | ||
result = cmn._stringify_path(p) | ||
assert result == 'foo/bar.csv' | ||
|
||
@pytest.mark.parametrize('extension,expected', [ | ||
|
@@ -97,36 +97,36 @@ def test_stringify_path_fspath(self): | |
@pytest.mark.parametrize('path_type', path_types) | ||
def test_infer_compression_from_path(self, extension, expected, path_type): | ||
path = path_type('foo/bar.csv' + extension) | ||
compression = common._infer_compression(path, compression='infer') | ||
compression = cmn._infer_compression(path, compression='infer') | ||
assert compression == expected | ||
|
||
def test_get_filepath_or_buffer_with_path(self): | ||
filename = '~/sometest' | ||
filepath_or_buffer, _, _, should_close = common.get_filepath_or_buffer( | ||
filepath_or_buffer, _, _, should_close = cmn.get_filepath_or_buffer( | ||
filename) | ||
assert filepath_or_buffer != filename | ||
assert isabs(filepath_or_buffer) | ||
assert os.path.isabs(filepath_or_buffer) | ||
assert os.path.expanduser(filename) == filepath_or_buffer | ||
assert not should_close | ||
|
||
def test_get_filepath_or_buffer_with_buffer(self): | ||
input_buffer = StringIO() | ||
filepath_or_buffer, _, _, should_close = common.get_filepath_or_buffer( | ||
filepath_or_buffer, _, _, should_close = cmn.get_filepath_or_buffer( | ||
input_buffer) | ||
assert filepath_or_buffer == input_buffer | ||
assert not should_close | ||
|
||
def test_iterator(self): | ||
reader = read_csv(StringIO(self.data1), chunksize=1) | ||
result = concat(reader, ignore_index=True) | ||
expected = read_csv(StringIO(self.data1)) | ||
reader = pd.read_csv(StringIO(self.data1), chunksize=1) | ||
result = pd.concat(reader, ignore_index=True) | ||
expected = pd.read_csv(StringIO(self.data1)) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
# GH12153 | ||
it = read_csv(StringIO(self.data1), chunksize=1) | ||
it = pd.read_csv(StringIO(self.data1), chunksize=1) | ||
first = next(it) | ||
tm.assert_frame_equal(first, expected.iloc[[0]]) | ||
tm.assert_frame_equal(concat(it), expected.iloc[1:]) | ||
tm.assert_frame_equal(pd.concat(it), expected.iloc[1:]) | ||
|
||
@pytest.mark.parametrize('reader, module, error_class, fn_ext', [ | ||
(pd.read_csv, 'os', FileNotFoundError, 'csv'), | ||
|
@@ -246,18 +246,18 @@ def test_constructor_bad_file(self, mmap_file): | |
msg = "[Errno 22]" | ||
err = mmap.error | ||
|
||
tm.assert_raises_regex(err, msg, common.MMapWrapper, non_file) | ||
tm.assert_raises_regex(err, msg, cmn.MMapWrapper, non_file) | ||
|
||
target = open(mmap_file, 'r') | ||
target.close() | ||
|
||
msg = "I/O operation on closed file" | ||
tm.assert_raises_regex( | ||
ValueError, msg, common.MMapWrapper, target) | ||
ValueError, msg, cmn.MMapWrapper, target) | ||
|
||
def test_get_attr(self, mmap_file): | ||
with open(mmap_file, 'r') as target: | ||
wrapper = common.MMapWrapper(target) | ||
wrapper = cmn.MMapWrapper(target) | ||
|
||
attrs = dir(wrapper.mmap) | ||
attrs = [attr for attr in attrs | ||
|
@@ -271,7 +271,7 @@ def test_get_attr(self, mmap_file): | |
|
||
def test_next(self, mmap_file): | ||
with open(mmap_file, 'r') as target: | ||
wrapper = common.MMapWrapper(target) | ||
wrapper = cmn.MMapWrapper(target) | ||
lines = target.readlines() | ||
|
||
for line in lines: | ||
|
@@ -285,4 +285,100 @@ def test_unknown_engine(self): | |
df = tm.makeDataFrame() | ||
df.to_csv(path) | ||
with tm.assert_raises_regex(ValueError, 'Unknown engine'): | ||
read_csv(path, engine='pyt') | ||
pd.read_csv(path, engine='pyt') | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put a market comment (e.g. line of --- or whatever), and put compression tests to delineate this section of the tests (also ok with a new test file tests_compression.py (maybe simpler) |
||
|
||
@pytest.mark.parametrize('obj', [ | ||
pd.DataFrame(100 * [[0.123456, 0.234567, 0.567567], | ||
[12.32112, 123123.2, 321321.2]], | ||
columns=['X', 'Y', 'Z']), | ||
pd.Series(100 * [0.123456, 0.234567, 0.567567], name='X')]) | ||
@pytest.mark.parametrize('method', ['to_pickle', 'to_json', 'to_csv']) | ||
def test_compression_size(obj, method, compression_only): | ||
|
||
with tm.ensure_clean() as path: | ||
getattr(obj, method)(path, compression=compression_only) | ||
compressed = os.path.getsize(path) | ||
getattr(obj, method)(path, compression=None) | ||
uncompressed = os.path.getsize(path) | ||
assert uncompressed > compressed | ||
|
||
|
||
@pytest.mark.parametrize('obj', [ | ||
pd.DataFrame(100 * [[0.123456, 0.234567, 0.567567], | ||
[12.32112, 123123.2, 321321.2]], | ||
columns=['X', 'Y', 'Z']), | ||
pd.Series(100 * [0.123456, 0.234567, 0.567567], name='X')]) | ||
@pytest.mark.parametrize('method', ['to_csv', 'to_json']) | ||
def test_compression_size_fh(obj, method, compression_only): | ||
|
||
with tm.ensure_clean() as path: | ||
f, handles = cmn._get_handle(path, 'w', compression=compression_only) | ||
with f: | ||
getattr(obj, method)(f) | ||
assert not f.closed | ||
assert f.closed | ||
compressed = os.path.getsize(path) | ||
with tm.ensure_clean() as path: | ||
f, handles = cmn._get_handle(path, 'w', compression=None) | ||
with f: | ||
getattr(obj, method)(f) | ||
assert not f.closed | ||
assert f.closed | ||
uncompressed = os.path.getsize(path) | ||
assert uncompressed > compressed | ||
|
||
|
||
@pytest.mark.parametrize('write_method, write_kwargs, read_method', [ | ||
('to_csv', {'index': False}, pd.read_csv), | ||
('to_json', {}, pd.read_json), | ||
('to_pickle', {}, pd.read_pickle), | ||
]) | ||
def test_dataframe_compression_defaults_to_infer( | ||
write_method, write_kwargs, read_method, compression_only): | ||
# Test that DataFrame.to_* methods default to inferring compression from | ||
# paths. GH 22004 | ||
input = pd.DataFrame([[1.0, 0, -4], [3.4, 5, 2]], columns=['X', 'Y', 'Z']) | ||
extension = cmn._compression_to_extension[compression_only] | ||
with tm.ensure_clean('compressed' + extension) as path: | ||
getattr(input, write_method)(path, **write_kwargs) | ||
output = read_method(path, compression=compression_only) | ||
tm.assert_frame_equal(output, input) | ||
|
||
|
||
@pytest.mark.parametrize('write_method,write_kwargs,read_method,read_kwargs', [ | ||
('to_csv', {'index': False, 'header': True}, | ||
pd.read_csv, {'squeeze': True}), | ||
('to_json', {}, pd.read_json, {'typ': 'series'}), | ||
('to_pickle', {}, pd.read_pickle, {}), | ||
]) | ||
def test_series_compression_defaults_to_infer( | ||
write_method, write_kwargs, read_method, read_kwargs, | ||
compression_only): | ||
# Test that Series.to_* methods default to inferring compression from | ||
# paths. GH 22004 | ||
input = pd.Series([0, 5, -2, 10], name='X') | ||
extension = cmn._compression_to_extension[compression_only] | ||
with tm.ensure_clean('compressed' + extension) as path: | ||
getattr(input, write_method)(path, **write_kwargs) | ||
output = read_method(path, compression=compression_only, **read_kwargs) | ||
tm.assert_series_equal(output, input, check_names=False) | ||
|
||
|
||
def test_compression_warning(compression_only): | ||
# Assert that passing a file object to to_csv while explicitly specifying a | ||
# compression protocol triggers a RuntimeWarning, as per GH 21227. | ||
# Note that pytest has an issue that causes assert_produces_warning to fail | ||
# in Python 2 if the warning has occurred in previous tests | ||
# (see https://git.io/fNEBm & https://git.io/fNEBC). Hence, should this | ||
# test fail in just Python 2 builds, it likely indicates that other tests | ||
# are producing RuntimeWarnings, thereby triggering the pytest bug. | ||
df = pd.DataFrame(100 * [[0.123456, 0.234567, 0.567567], | ||
[12.32112, 123123.2, 321321.2]], | ||
columns=['X', 'Y', 'Z']) | ||
with tm.ensure_clean() as path: | ||
f, handles = cmn._get_handle(path, 'w', compression=compression_only) | ||
with tm.assert_produces_warning(RuntimeWarning, | ||
check_stacklevel=False): | ||
with f: | ||
df.to_csv(f, compression=compression_only) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to icom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in af8c137