Skip to content

Commit

Permalink
Add whatsnew for arrow (#54476)
Browse files Browse the repository at this point in the history
* Add whatsnew for arrow

* Update

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
  • Loading branch information
phofl and mroeschke authored Aug 9, 2023
1 parent c54ceec commit 224457d
Showing 1 changed file with 40 additions and 0 deletions.
40 changes: 40 additions & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,46 @@ including other versions of pandas.
Enhancements
~~~~~~~~~~~~

.. _whatsnew_210.enhancements.pyarrow_dependency:

PyArrow will become a required dependency with pandas 3.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`PyArrow <https://arrow.apache.org/docs/python/index.html>`_ will become a required
dependency of pandas starting with pandas 3.0. This decision was made based on
`PDEP 12 <https://pandas.pydata.org/pdeps/0010-required-pyarrow-dependency.html>`_.

This will enable more changes that are hugely beneficial to pandas users, including
but not limited to:

- inferring strings as PyArrow backed strings by default enabling a significant
reduction of the memory footprint and huge performance improvements.
- inferring more complex dtypes with PyArrow by default, like ``Decimal``, ``lists``,
``bytes``, ``structured data`` and more.
- Better interoperability with other libraries that depend on Apache Arrow.

We are collecting feedback on this decision `here <https://github.com/pandas-dev/pandas/issues/54466>`_.

.. _whatsnew_210.enhancements.infer_strings:

Avoid NumPy object dtype for strings by default
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, all strings were stored in columns with NumPy object dtype.
This release introduces an option ``future.infer_string`` that infers all
strings as PyArrow backed strings with dtype ``pd.ArrowDtype(pa.string())`` instead.
This option only works if PyArrow is installed. PyArrow backed strings have a
significantly reduced memory footprint and provide a big performance improvement
compared to NumPy object.

The option can be enabled with:

.. code-block:: python
pd.options.future.infer_string = True
This behavior will become the default with pandas 3.0.

.. _whatsnew_210.enhancements.reduction_extension_dtypes:

DataFrame reductions preserve extension dtypes
Expand Down

0 comments on commit 224457d

Please sign in to comment.