Skip to content

Conversation

stephen-hoover
Copy link
Contributor

This is the one modification related to issue #11070 which affects non-S3 interactions with read_csv. The Python 3 standard library has an improved capability for handling bz2 compression, so a simple change will let read_csv stream bz2-compressed files.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

tests!

@stephen-hoover
Copy link
Contributor Author

I added a test for reading from an open file with the C parser. It fails on the master branch and passes here. How's that?

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

do you have exactly the same deps

@stephen-hoover
Copy link
Contributor Author

Yes, exactly the same dependencies. This PR works because the standard library bz2 module was upgraded to accept file pointers in 3.3.

@jreback jreback added IO Data IO issues that don't fit into a more specific label IO CSV read_csv, to_csv labels Sep 12, 2015
@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

ok, this looks good. pls add a note in whatsnew for 0.17.0 (just released the rc1 yesterday, but this is ok). reference both the original issue and this PR number I think.

squash & ping when green.

@jreback jreback added this to the 0.17.0 milestone Sep 12, 2015
@stephen-hoover
Copy link
Contributor Author

Note added. It doesn't look like anything else references a PR; should I leave that reference in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reference it like an issue :issue:11072``, we don't distinguish

Python 2 can't read bz2 files, but Python 3 can. Python 3 can also read bzip files one piece at a time.
@stephen-hoover
Copy link
Contributor Author

@jreback , tests are green!

jreback added a commit that referenced this pull request Sep 13, 2015
ENH Enable bzip2 streaming for Python 3
@jreback jreback merged commit e8d4243 into pandas-dev:master Sep 13, 2015
@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

thanks!

@stephen-hoover stephen-hoover deleted the stream-bzip2-files branch September 14, 2015 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants