Skip to content

Commit dc4b070

Browse files
TomAugspurgerjreback
authored andcommitted
COMPAT/REF: Use s3fs for s3 IO
closes #11915 Author: Tom Augspurger <tom.augspurger88@gmail.com> Closes #13137 from TomAugspurger/s3fs and squashes the following commits: 92ac063 [Tom Augspurger] CI: Update deps, docs 81690b5 [Tom Augspurger] COMPAT/REF: Use s3fs for s3 IO
1 parent 8c798c0 commit dc4b070

File tree

14 files changed

+72
-120
lines changed

14 files changed

+72
-120
lines changed

asv_bench/benchmarks/io_bench.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ def setup(self, compression, engine):
153153
# The Python 2 C parser can't read bz2 from open files.
154154
raise NotImplementedError
155155
try:
156-
import boto
156+
import s3fs
157157
except ImportError:
158158
# Skip these benchmarks if `boto` is not installed.
159159
raise NotImplementedError

ci/requirements-2.7-64.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ sqlalchemy
1111
lxml=3.2.1
1212
scipy
1313
xlsxwriter
14-
boto
14+
s3fs
1515
bottleneck
1616
html5lib
1717
beautiful-soup

ci/requirements-2.7.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ sqlalchemy=0.9.6
1111
lxml=3.2.1
1212
scipy
1313
xlsxwriter=0.4.6
14-
boto=2.36.0
14+
s3fs
1515
bottleneck
1616
psycopg2=2.5.2
1717
patsy

ci/requirements-2.7_SLOW.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ numexpr
1313
pytables
1414
sqlalchemy
1515
lxml
16-
boto
16+
s3fs
1717
bottleneck
1818
psycopg2
1919
pymysql

ci/requirements-3.5.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ sqlalchemy
1717
pymysql
1818
psycopg2
1919
xarray
20-
boto
20+
s3fs
2121

2222
# incompat with conda ATM
2323
# beautiful-soup

ci/requirements-3.5_OSX.run

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ matplotlib
1212
jinja2
1313
bottleneck
1414
xarray
15-
boto
15+
s3fs
1616

1717
# incompat with conda ATM
1818
# beautiful-soup

doc/source/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ Optional Dependencies
262262
* `XlsxWriter <https://pypi.python.org/pypi/XlsxWriter>`__: Alternative Excel writer
263263

264264
* `Jinja2 <http://jinja.pocoo.org/>`__: Template engine for conditional HTML formatting.
265-
* `boto <https://pypi.python.org/pypi/boto>`__: necessary for Amazon S3 access.
265+
* `s3fs <http://s3fs.readthedocs.io/>`__: necessary for Amazon S3 access (s3fs >= 0.0.7).
266266
* `blosc <https://pypi.python.org/pypi/blosc>`__: for msgpack compression using ``blosc``
267267
* One of `PyQt4
268268
<http://www.riverbankcomputing.com/software/pyqt/download>`__, `PySide

doc/source/io.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1487,6 +1487,23 @@ options include:
14871487
Specifying any of the above options will produce a ``ParserWarning`` unless the
14881488
python engine is selected explicitly using ``engine='python'``.
14891489

1490+
Reading remote files
1491+
''''''''''''''''''''
1492+
1493+
You can pass in a URL to a CSV file:
1494+
1495+
.. code-block:: python
1496+
1497+
df = pd.read_csv('https://download.bls.gov/pub/time.series/cu/cu.item',
1498+
sep='\t')
1499+
1500+
S3 URLs are handled as well:
1501+
1502+
.. code-block:: python
1503+
1504+
df = pd.read_csv('s3://pandas-test/tips.csv')
1505+
1506+
14901507
Writing out Data
14911508
''''''''''''''''
14921509

doc/source/whatsnew/v0.20.0.txt

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,12 +108,12 @@ Other enhancements
108108

109109
- ``.select_dtypes()`` now allows `datetimetz` to generically select datetimes with tz (:issue:`14910`)
110110

111+
111112
.. _whatsnew_0200.api_breaking:
112113

113114
Backwards incompatible API changes
114115
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115116

116-
117117
.. _whatsnew.api_breaking.index_map
118118

119119
Map on Index types now return other Index types
@@ -182,8 +182,16 @@ Map on Index types now return other Index types
182182

183183
s.map(lambda x: x.hour)
184184

185+
.. _whatsnew_0200.s3:
186+
187+
S3 File Handling
188+
^^^^^^^^^^^^^^^^
185189

186-
.. _whatsnew_0200.api:
190+
pandas now uses `s3fs <http://s3fs.readthedocs.io/>`_ for handling S3 connections. This shouldn't break
191+
any code. However, since s3fs is not a required dependency, you will need to install it separately (like boto
192+
in prior versions of pandas) (:issue:`11915`).
193+
194+
.. _whatsnew_0200.api:
187195

188196
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
189197
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
@@ -194,7 +202,6 @@ Map on Index types now return other Index types
194202
Other API Changes
195203
^^^^^^^^^^^^^^^^^
196204

197-
198205
.. _whatsnew_0200.deprecations:
199206

200207
Deprecations

pandas/io/common.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@
1212
from pandas.core.common import AbstractMethodError
1313
from pandas.types.common import is_number
1414

15+
try:
16+
from s3fs import S3File
17+
need_text_wrapping = (BytesIO, S3File)
18+
except ImportError:
19+
need_text_wrapping = (BytesIO,)
20+
1521
# common NA values
1622
# no longer excluding inf representations
1723
# '1.#INF','-1.#INF', '1.#INF000000',
@@ -212,10 +218,10 @@ def get_filepath_or_buffer(filepath_or_buffer, encoding=None,
212218
return reader, encoding, compression
213219

214220
if _is_s3_url(filepath_or_buffer):
215-
from pandas.io.s3 import get_filepath_or_buffer
216-
return get_filepath_or_buffer(filepath_or_buffer,
217-
encoding=encoding,
218-
compression=compression)
221+
from pandas.io import s3
222+
return s3.get_filepath_or_buffer(filepath_or_buffer,
223+
encoding=encoding,
224+
compression=compression)
219225

220226
# It is a pathlib.Path/py.path.local or string
221227
filepath_or_buffer = _stringify_path(filepath_or_buffer)
@@ -391,7 +397,7 @@ def _get_handle(path_or_buf, mode, encoding=None, compression=None,
391397
handles.append(f)
392398

393399
# in Python 3, convert BytesIO or fileobjects passed with an encoding
394-
if compat.PY3 and (compression or isinstance(f, compat.BytesIO)):
400+
if compat.PY3 and (compression or isinstance(f, need_text_wrapping)):
395401
from io import TextIOWrapper
396402
f = TextIOWrapper(f, encoding=encoding)
397403
handles.append(f)

0 commit comments

Comments
 (0)