Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add infer_objects for soft conversions #16915

Merged
merged 4 commits into from
Jul 18, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ Conversion
:toctree: generated/

Series.astype
Series.infer_objects
Series.copy
Series.isnull
Series.notnull
Expand Down Expand Up @@ -777,6 +778,7 @@ Conversion

DataFrame.astype
DataFrame.convert_objects
DataFrame.infer_objects
DataFrame.copy
DataFrame.isnull
DataFrame.notnull
Expand Down
17 changes: 16 additions & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2024,7 +2024,22 @@ object conversion
~~~~~~~~~~~~~~~~~

pandas offers various functions to try to force conversion of types from the ``object`` dtype to other types.
The following functions are available for one dimensional object arrays or scalars:
In cases where the data is already of the correct type, but stored in an ``object`` array, the
:meth:`~DataFrame.infer_objects` and :meth:`~Series.infer_objects` can be used to soft convert
to the correct type.

.. ipython:: python

df = pd.DataFrame([[1, 2],
['a', 'b'],
[datetime.datetime(2016, 3, 2), datetime.datetime(2016, 3, 2)]])
df = df.T
df
df.dtypes
df.infer_objects().dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separate ipython block (to break up the code)


The following functions are available for one dimensional object arrays or scalars to perform
hard conversion of objects to a specified type:

- :meth:`~pandas.to_numeric` (conversion to numeric dtypes)

Expand Down
33 changes: 33 additions & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,39 @@ New features
- Added ``__fspath__`` method to :class:`~pandas.HDFStore`, :class:`~pandas.ExcelFile`,
and :class:`~pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a sub-section reference

.. _whatsnew_0210.enhancements.infer_objects:

``infer_objects`` type conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The `:meth:`~DataFrame.infer_objects` and :meth:`~Series.infer_objects`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ref to the basics section above.

have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated ``convert_objects``
method (:issue:`11221`)

This function only performs soft conversions on object columns, converting Python objects
to native types, but not any coercive conversions. For example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': np.array([1, 2, 3], dtype='object'),
'C': ['1', '2', '3']})
df.dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sould have this same example in the docs (in basics.rst) and a ref to that.

df.infer_objects().dtype

Note that column ``'C'`` was not converted - only scalar numeric types
will be inferred to a new type. Other types of conversion should be accomplished
using :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`). See
the documentation :ref:`here <basics.object_conversion>` for more details.

.. ipython:: python

df = df.infer_objects()
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes

.. _whatsnew_0210.enhancements.other:

Other Enhancements
Expand Down
54 changes: 51 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3634,16 +3634,64 @@ def convert_objects(self, convert_dates=True, convert_numeric=False,
converted : same as input object
"""
from warnings import warn
warn("convert_objects is deprecated. Use the data-type specific "
"converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.",
FutureWarning, stacklevel=2)
msg = ("convert_objects is deprecated. To re-infer data dtypes for "
"object columns, use {klass}.infer_objects()\nFor all "
"other conversions use the data-type specific converters "
"pd.to_datetime, pd.to_timedelta and pd.to_numeric."
).format(klass=self.__class__.__name__)
warn(msg, FutureWarning, stacklevel=2)

return self._constructor(
self._data.convert(convert_dates=convert_dates,
convert_numeric=convert_numeric,
convert_timedeltas=convert_timedeltas,
copy=copy)).__finalize__(self)

def infer_objects(self):
"""
Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.

.. versionadded:: 0.20.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The beginning of the docstring should be a single line.

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a note that this is the same as the inference used when a DataFrame is constructed?

--------
pandas.to_datetime : Convert argument to datetime.
pandas.to_timedelta : Convert argument to timedelta.
pandas.to_numeric : Convert argument to numeric typeR

Returns
-------
converted : same type as input object

Examples
--------
>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
A
1 1
2 2
3 3
>>> df.dtypes
A object
dtype: object
>>> df.infer_objects().dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like blank lines between these sections of code

A int64
dtype: object
"""
# numeric=False necessary to only soft convert;
# python objects will still be converted to
# native numpy numeric types
return self._constructor(
self._data.convert(datetime=True, numeric=False,
timedelta=True, coerce=False,
copy=True)).__finalize__(self)

# ----------------------------------------------------------------------
# Filling NA's

Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/frame/test_block_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,32 @@ def test_convert_objects_no_conversion(self):
mixed2 = mixed1._convert(datetime=True)
assert_frame_equal(mixed1, mixed2)

def test_infer_objects(self):
# GH 11221
df = DataFrame({'a': ['a', 1, 2, 3],
'b': ['b', 2.0, 3.0, 4.1],
'c': ['c', datetime(2016, 1, 1),
datetime(2016, 1, 2),
datetime(2016, 1, 3)],
'd': [1, 2, 3, 'd']},
columns=['a', 'b', 'c', 'd'])
df = df.iloc[1:].infer_objects()

assert df['a'].dtype == 'int64'
assert df['b'].dtype == 'float64'
assert df['c'].dtype == 'M8[ns]'
assert df['d'].dtype == 'object'

expected = DataFrame({'a': [1, 2, 3],
'b': [2.0, 3.0, 4.1],
'c': [datetime(2016, 1, 1),
datetime(2016, 1, 2),
datetime(2016, 1, 3)],
'd': [2, 3, 'd']},
columns=['a', 'b', 'c', 'd'])
# reconstruct frame to verify inference is same
tm.assert_frame_equal(df.reset_index(drop=True), expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this part of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enforces that the inference rules I'm checking against are the same as DataFrame constructor

Copy link
Member

@gfyoung gfyoung Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Perhaps a comment above expected might be useful, since it wasn't immediately clear (at least to me) why this was done.


def test_stale_cached_series_bug_473(self):

# this is chained, but ok
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/series/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,3 +268,21 @@ def test_series_to_categorical(self):
expected = Series(['a', 'b', 'c'], dtype='category')

tm.assert_series_equal(result, expected)

def test_infer_objects_series(self):
# GH 11221
actual = Series(np.array([1, 2, 3], dtype='O')).infer_objects()
expected = Series([1, 2, 3])
tm.assert_series_equal(actual, expected)

actual = Series(np.array([1, 2, 3, None], dtype='O')).infer_objects()
expected = Series([1., 2., 3., np.nan])
tm.assert_series_equal(actual, expected)

# only soft conversions, uncovertable pass thru unchanged
actual = (Series(np.array([1, 2, 3, None, 'a'], dtype='O'))
.infer_objects())
expected = Series([1, 2, 3, None, 'a'])

assert actual.dtype == 'object'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a comment above that block of code explaining why this is the case would be useful.

Copy link
Member

@gfyoung gfyoung Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why isn't there a similar test for DataFrame (i.e. one where we actually get object) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add newline between "expected =" and "assert " (for readability)

tm.assert_series_equal(actual, expected)