Skip to content

Commit 936ca49

Browse files
committed
Merge remote-tracking branch 'upstream/master' into styler_max_rows_cols
2 parents c7e7e55 + e39ea30 commit 936ca49

File tree

114 files changed

+1449
-639
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+1449
-639
lines changed

.github/ISSUE_TEMPLATE/documentation_improvement.md

Lines changed: 0 additions & 22 deletions
This file was deleted.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Documentation Improvement
2+
description: Report wrong or missing documentation
3+
title: "DOC: "
4+
labels: [Docs, Needs Triage]
5+
6+
body:
7+
- type: checkboxes
8+
attributes:
9+
options:
10+
- label: >
11+
I have checked that the issue still exists on the latest versions of the docs
12+
on `master` [here](https://pandas.pydata.org/docs/dev/)
13+
required: true
14+
- type: textarea
15+
id: location
16+
attributes:
17+
label: Location of the documentation
18+
description: >
19+
Please provide the location of the documentation, e.g. "pandas.read_csv" or the
20+
URL of the documentation, e.g.
21+
"https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html"
22+
placeholder: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
23+
validations:
24+
required: true
25+
- type: textarea
26+
id: problem
27+
attributes:
28+
label: Documentation problem
29+
description: >
30+
Please provide a description of what documentation you believe needs to be fixed/improved
31+
validations:
32+
required: true
33+
- type: textarea
34+
id: suggested-fix
35+
attributes:
36+
label: Suggested fix for documentation
37+
description: >
38+
Please explain the suggested fix and **why** it's better than the existing documentation
39+
validations:
40+
required: true

asv_bench/benchmarks/io/csv.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ def time_read_csv(self, bad_date_value):
206206
class ReadCSVSkipRows(BaseIO):
207207

208208
fname = "__test__.csv"
209-
params = ([None, 10000], ["c", "python"])
209+
params = ([None, 10000], ["c", "python", "pyarrow"])
210210
param_names = ["skiprows", "engine"]
211211

212212
def setup(self, skiprows, engine):
@@ -320,7 +320,7 @@ def time_read_csv_python_engine(self, sep, decimal, float_precision):
320320

321321

322322
class ReadCSVEngine(StringIORewind):
323-
params = ["c", "python"]
323+
params = ["c", "python", "pyarrow"]
324324
param_names = ["engine"]
325325

326326
def setup(self, engine):

doc/source/development/contributing_codebase.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ run this command, though it may take longer::
181181

182182
git diff upstream/master --name-only -- "*.py" | xargs -r flake8
183183

184-
Note that on OSX, the ``-r`` flag is not available, so you have to omit it and
184+
Note that on macOS, the ``-r`` flag is not available, so you have to omit it and
185185
run this slightly modified command::
186186

187187
git diff upstream/master --name-only -- "*.py" | xargs flake8
@@ -244,7 +244,7 @@ Alternatively, you can run a command similar to what was suggested for ``black``
244244

245245
git diff upstream/master --name-only -- "*.py" | xargs -r isort
246246

247-
Where similar caveats apply if you are on OSX or Windows.
247+
Where similar caveats apply if you are on macOS or Windows.
248248

249249
You can then verify the changes look ok, then git :any:`commit <contributing.commit-code>` and :any:`push <contributing.push-code>`.
250250

doc/source/getting_started/install.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,9 @@ Installing from PyPI
132132
pandas can be installed via pip from
133133
`PyPI <https://pypi.org/project/pandas>`__.
134134

135+
.. note::
136+
You must have ``pip>=19.3`` to install from PyPI.
137+
135138
::
136139

137140
pip install pandas

doc/source/user_guide/io.rst

Lines changed: 46 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -160,9 +160,15 @@ dtype : Type name or dict of column -> type, default ``None``
160160
(unsupported with ``engine='python'``). Use ``str`` or ``object`` together
161161
with suitable ``na_values`` settings to preserve and
162162
not interpret dtype.
163-
engine : {``'c'``, ``'python'``}
164-
Parser engine to use. The C engine is faster while the Python engine is
165-
currently more feature-complete.
163+
engine : {``'c'``, ``'python'``, ``'pyarrow'``}
164+
Parser engine to use. The C and pyarrow engines are faster, while the python engine
165+
is currently more feature-complete. Multithreading is currently only supported by
166+
the pyarrow engine.
167+
168+
.. versionadded:: 1.4.0
169+
170+
The "pyarrow" engine was added as an *experimental* engine, and some features
171+
are unsupported, or may not work correctly, with this engine.
166172
converters : dict, default ``None``
167173
Dict of functions for converting values in certain columns. Keys can either be
168174
integers or column labels.
@@ -1622,11 +1628,17 @@ Specifying ``iterator=True`` will also return the ``TextFileReader`` object:
16221628
Specifying the parser engine
16231629
''''''''''''''''''''''''''''
16241630

1625-
Under the hood pandas uses a fast and efficient parser implemented in C as well
1626-
as a Python implementation which is currently more feature-complete. Where
1627-
possible pandas uses the C parser (specified as ``engine='c'``), but may fall
1628-
back to Python if C-unsupported options are specified. Currently, C-unsupported
1629-
options include:
1631+
Pandas currently supports three engines, the C engine, the python engine, and an experimental
1632+
pyarrow engine (requires the ``pyarrow`` package). In general, the pyarrow engine is fastest
1633+
on larger workloads and is equivalent in speed to the C engine on most other workloads.
1634+
The python engine tends to be slower than the pyarrow and C engines on most workloads. However,
1635+
the pyarrow engine is much less robust than the C engine, which lacks a few features compared to the
1636+
Python engine.
1637+
1638+
Where possible, pandas uses the C parser (specified as ``engine='c'``), but it may fall
1639+
back to Python if C-unsupported options are specified.
1640+
1641+
Currently, options unsupported by the C and pyarrow engines include:
16301642

16311643
* ``sep`` other than a single character (e.g. regex separators)
16321644
* ``skipfooter``
@@ -1635,6 +1647,32 @@ options include:
16351647
Specifying any of the above options will produce a ``ParserWarning`` unless the
16361648
python engine is selected explicitly using ``engine='python'``.
16371649

1650+
Options that are unsupported by the pyarrow engine which are not covered by the list above include:
1651+
1652+
* ``float_precision``
1653+
* ``chunksize``
1654+
* ``comment``
1655+
* ``nrows``
1656+
* ``thousands``
1657+
* ``memory_map``
1658+
* ``dialect``
1659+
* ``warn_bad_lines``
1660+
* ``error_bad_lines``
1661+
* ``on_bad_lines``
1662+
* ``delim_whitespace``
1663+
* ``quoting``
1664+
* ``lineterminator``
1665+
* ``converters``
1666+
* ``decimal``
1667+
* ``iterator``
1668+
* ``dayfirst``
1669+
* ``infer_datetime_format``
1670+
* ``verbose``
1671+
* ``skipinitialspace``
1672+
* ``low_memory``
1673+
1674+
Specifying these options with ``engine='pyarrow'`` will raise a ``ValueError``.
1675+
16381676
.. _io.remote:
16391677

16401678
Reading/writing remote files

doc/source/user_guide/timeseries.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,16 +204,18 @@ If you use dates which start with the day first (i.e. European style),
204204
you can pass the ``dayfirst`` flag:
205205

206206
.. ipython:: python
207+
:okwarning:
207208
208209
pd.to_datetime(["04-01-2012 10:00"], dayfirst=True)
209210
210211
pd.to_datetime(["14-01-2012", "01-14-2012"], dayfirst=True)
211212
212213
.. warning::
213214

214-
You see in the above example that ``dayfirst`` isn't strict, so if a date
215+
You see in the above example that ``dayfirst`` isn't strict. If a date
215216
can't be parsed with the day being first it will be parsed as if
216-
``dayfirst`` were False.
217+
``dayfirst`` were False, and in the case of parsing delimited date strings
218+
(e.g. ``31-12-2012``) then a warning will also be raised.
217219

218220
If you pass a single string to ``to_datetime``, it returns a single ``Timestamp``.
219221
``Timestamp`` can also accept string input, but it doesn't accept string parsing

doc/source/whatsnew/v1.3.3.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ Fixed regressions
1717
- Fixed regression in :class:`DataFrame` constructor failing to broadcast for defined :class:`Index` and len one list of :class:`Timestamp` (:issue:`42810`)
1818
- Performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)
1919
- Fixed regression in :meth:`.GroupBy.agg` incorrectly raising in some cases (:issue:`42390`)
20-
-
20+
- Fixed regression in :meth:`RangeIndex.where` and :meth:`RangeIndex.putmask` raising ``AssertionError`` when result did not represent a :class:`RangeIndex` (:issue:`43240`)
2121

2222
.. ---------------------------------------------------------------------------
2323
2424
.. _whatsnew_133.bug_fixes:
2525

2626
Bug fixes
2727
~~~~~~~~~
28-
-
28+
- Bug in :meth:`.DataFrameGroupBy.agg` and :meth:`.DataFrameGroupBy.transform` with ``engine="numba"`` where ``index`` data was not being correctly passed into ``func`` (:issue:`43133`)
2929
-
3030

3131
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.4.0.rst

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,13 @@ Styler
7878

7979
There are also bug fixes and deprecations listed below.
8080

81-
.. _whatsnew_140.enhancements.enhancement2:
81+
.. _whatsnew_140.enhancements.pyarrow_csv_engine:
8282

83-
enhancement2
84-
^^^^^^^^^^^^
83+
Multithreaded CSV reading with a new CSV Engine based on pyarrow
84+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
85+
86+
:func:`pandas.read_csv` now accepts ``engine="pyarrow"`` (requires at least ``pyarrow`` 0.17.0) as an argument, allowing for faster csv parsing on multicore machines
87+
with pyarrow installed. See the :doc:`I/O docs </user_guide/io>` for more info. (:issue:`23697`)
8588

8689
.. _whatsnew_140.enhancements.other:
8790

@@ -103,10 +106,20 @@ Notable bug fixes
103106

104107
These are bug fixes that might have notable behavior changes.
105108

106-
.. _whatsnew_140.notable_bug_fixes.notable_bug_fix1:
109+
.. _whatsnew_140.notable_bug_fixes.inconsistent_date_string_parsing:
107110

108-
notable_bug_fix1
109-
^^^^^^^^^^^^^^^^
111+
Inconsistent date string parsing
112+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113+
114+
The ``dayfirst`` option of :func:`to_datetime` isn't strict, and this can lead to surprising behaviour:
115+
116+
.. ipython:: python
117+
:okwarning:
118+
119+
pd.to_datetime(["31-12-2021"], dayfirst=False)
120+
121+
Now, a warning will be raised if a date string cannot be parsed accordance to the given ``dayfirst`` value when
122+
the value is a delimited date string (e.g. ``31-12-2012``).
110123

111124
.. _whatsnew_140.notable_bug_fixes.notable_bug_fix2:
112125

@@ -253,6 +266,7 @@ Categorical
253266
Datetimelike
254267
^^^^^^^^^^^^
255268
- Bug in :class:`DataFrame` constructor unnecessarily copying non-datetimelike 2D object arrays (:issue:`39272`)
269+
- :func:`to_datetime` would silently swap ``MM/DD/YYYY`` and ``DD/MM/YYYY`` formats if the given ``dayfirst`` option could not be respected - now, a warning is raised in the case of delimited date strings (e.g. ``31-12-2012``) (:issue:`12585`)
256270
-
257271

258272
Timedelta

environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ dependencies:
105105

106106
- pytables>=3.6.1 # pandas.read_hdf, DataFrame.to_hdf
107107
- s3fs>=0.4.0 # file IO when using 's3://...' path
108+
- aiobotocore
108109
- fsspec>=0.7.4, <2021.6.0 # for generic remote file operations
109110
- gcsfs>=0.6.0 # file IO when using 'gcs://...' path
110111
- sqlalchemy # pandas.read_sql, DataFrame.to_sql

0 commit comments

Comments
 (0)