Skip to content

Commit

Permalink
API: Uses pd.NA in IntegerArray (#29964)
Browse files Browse the repository at this point in the history
  • Loading branch information
TomAugspurger authored and jreback committed Dec 30, 2019
1 parent 9c40e06 commit 844dc4a
Show file tree
Hide file tree
Showing 7 changed files with 298 additions and 88 deletions.
28 changes: 28 additions & 0 deletions doc/source/user_guide/integer_na.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Nullable integer data type
IntegerArray is currently experimental. Its API or implementation may
change without warning.

.. versionchanged:: 1.0.0

Now uses :attr:`pandas.NA` as the missing value rather
than :attr:`numpy.nan`.

In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
missing data. Because ``NaN`` is a float, this forces an array of integers with
Expand All @@ -23,6 +27,9 @@ much. But if your integer column is, say, an identifier, casting to float can
be problematic. Some integers cannot even be represented as floating point
numbers.

Construction
------------

Pandas can represent integer data with possibly missing values using
:class:`arrays.IntegerArray`. This is an :ref:`extension types <extending.extension-types>`
implemented within pandas.
Expand All @@ -39,6 +46,12 @@ NumPy's ``'int64'`` dtype:
pd.array([1, 2, np.nan], dtype="Int64")
All NA-like values are replaced with :attr:`pandas.NA`.

.. ipython:: python
pd.array([1, 2, np.nan, None, pd.NA], dtype="Int64")
This array can be stored in a :class:`DataFrame` or :class:`Series` like any
NumPy array.

Expand Down Expand Up @@ -78,6 +91,9 @@ with the dtype.
In the future, we may provide an option for :class:`Series` to infer a
nullable-integer dtype.

Operations
----------

Operations involving an integer array will behave similar to NumPy arrays.
Missing values will be propagated, and the data will be coerced to another
dtype if needed.
Expand Down Expand Up @@ -123,3 +139,15 @@ Reduction and groupby operations such as 'sum' work as well.
df.sum()
df.groupby('B').A.sum()
Scalar NA Value
---------------

:class:`arrays.IntegerArray` uses :attr:`pandas.NA` as its scalar
missing value. Slicing a single element that's missing will return
:attr:`pandas.NA`

.. ipython:: python
a = pd.array([1, None], dtype="Int64")
a[1]
58 changes: 58 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,64 @@ The following methods now also correctly output values for unobserved categories
As a reminder, you can specify the ``dtype`` to disable all inference.

:class:`arrays.IntegerArray` now uses :attr:`pandas.NA`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`arrays.IntegerArray` now uses :attr:`pandas.NA` rather than
:attr:`numpy.nan` as its missing value marker (:issue:`29964`).

*pandas 0.25.x*

.. code-block:: python
>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64
>>> a[2]
nan
*pandas 1.0.0*

.. ipython:: python
a = pd.array([1, 2, None], dtype="Int64")
a[2]
See :ref:`missing_data.NA` for more on the differences between :attr:`pandas.NA`
and :attr:`numpy.nan`.

:class:`arrays.IntegerArray` comparisons return :class:`arrays.BooleanArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Comparison operations on a :class:`arrays.IntegerArray` now returns a
:class:`arrays.BooleanArray` rather than a NumPy array (:issue:`29964`).

*pandas 0.25.x*

.. code-block:: python
>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64
>>> a > 1
array([False, True, False])
*pandas 1.0.0*

.. ipython:: python
a = pd.array([1, 2, None], dtype="Int64")
a > 1
Note that missing values now propagate, rather than always comparing unequal
like :attr:`numpy.nan`. See :ref:`missing_data.NA` for more.

By default :meth:`Categorical.min` now returns the minimum instead of np.nan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,7 +730,6 @@ def all(self, skipna: bool = True, **kwargs):
@classmethod
def _create_logical_method(cls, op):
def logical_method(self, other):

if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented
Expand Down Expand Up @@ -777,8 +776,11 @@ def logical_method(self, other):
@classmethod
def _create_comparison_method(cls, op):
def cmp_method(self, other):
from pandas.arrays import IntegerArray

if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
if isinstance(
other, (ABCDataFrame, ABCSeries, ABCIndexClass, IntegerArray)
):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

Expand Down
Loading

0 comments on commit 844dc4a

Please sign in to comment.