-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: add infer_objects for soft conversions #16915
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,39 @@ New features | |
- Added ``__fspath__`` method to :class:`~pandas.HDFStore`, :class:`~pandas.ExcelFile`, | ||
and :class:`~pandas.ExcelWriter` to work properly with the file system path protocol (:issue:`13823`) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a sub-section reference |
||
.. _whatsnew_0210.enhancements.infer_objects: | ||
|
||
``infer_objects`` type conversion | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The `:meth:`~DataFrame.infer_objects` and :meth:`~Series.infer_objects` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a ref to the basics section above. |
||
have been added to perform dtype inference on object columns, replacing | ||
some of the functionality of the deprecated ``convert_objects`` | ||
method (:issue:`11221`) | ||
|
||
This function only performs soft conversions on object columns, converting Python objects | ||
to native types, but not any coercive conversions. For example: | ||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({'A': [1, 2, 3], | ||
'B': np.array([1, 2, 3], dtype='object'), | ||
'C': ['1', '2', '3']}) | ||
df.dtypes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sould have this same example in the docs (in basics.rst) and a ref to that. |
||
df.infer_objects().dtype | ||
|
||
Note that column ``'C'`` was not converted - only scalar numeric types | ||
will be inferred to a new type. Other types of conversion should be accomplished | ||
using :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`). See | ||
the documentation :ref:`here <basics.object_conversion>` for more details. | ||
|
||
.. ipython:: python | ||
|
||
df = df.infer_objects() | ||
df['C'] = pd.to_numeric(df['C'], errors='coerce') | ||
df.dtypes | ||
|
||
.. _whatsnew_0210.enhancements.other: | ||
|
||
Other Enhancements | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3634,16 +3634,64 @@ def convert_objects(self, convert_dates=True, convert_numeric=False, | |
converted : same as input object | ||
""" | ||
from warnings import warn | ||
warn("convert_objects is deprecated. Use the data-type specific " | ||
"converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.", | ||
FutureWarning, stacklevel=2) | ||
msg = ("convert_objects is deprecated. To re-infer data dtypes for " | ||
"object columns, use {klass}.infer_objects()\nFor all " | ||
"other conversions use the data-type specific converters " | ||
"pd.to_datetime, pd.to_timedelta and pd.to_numeric." | ||
).format(klass=self.__class__.__name__) | ||
warn(msg, FutureWarning, stacklevel=2) | ||
|
||
return self._constructor( | ||
self._data.convert(convert_dates=convert_dates, | ||
convert_numeric=convert_numeric, | ||
convert_timedeltas=convert_timedeltas, | ||
copy=copy)).__finalize__(self) | ||
|
||
def infer_objects(self): | ||
""" | ||
Attempt to infer better dtypes for object columns. | ||
|
||
Attempts soft conversion of object-dtyped | ||
columns, leaving non-object and unconvertible | ||
columns unchanged. The inference rules are the | ||
same as during normal Series/DataFrame construction. | ||
|
||
.. versionadded:: 0.20.0 | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The beginning of the docstring should be a single line. |
||
See Also | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a note that this is the same as the inference used when a DataFrame is constructed? |
||
-------- | ||
pandas.to_datetime : Convert argument to datetime. | ||
pandas.to_timedelta : Convert argument to timedelta. | ||
pandas.to_numeric : Convert argument to numeric typeR | ||
|
||
Returns | ||
------- | ||
converted : same type as input object | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({"A": ["a", 1, 2, 3]}) | ||
>>> df = df.iloc[1:] | ||
>>> df | ||
A | ||
1 1 | ||
2 2 | ||
3 3 | ||
>>> df.dtypes | ||
A object | ||
dtype: object | ||
>>> df.infer_objects().dtypes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like blank lines between these sections of code |
||
A int64 | ||
dtype: object | ||
""" | ||
# numeric=False necessary to only soft convert; | ||
# python objects will still be converted to | ||
# native numpy numeric types | ||
return self._constructor( | ||
self._data.convert(datetime=True, numeric=False, | ||
timedelta=True, coerce=False, | ||
copy=True)).__finalize__(self) | ||
|
||
# ---------------------------------------------------------------------- | ||
# Filling NA's | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -495,6 +495,32 @@ def test_convert_objects_no_conversion(self): | |
mixed2 = mixed1._convert(datetime=True) | ||
assert_frame_equal(mixed1, mixed2) | ||
|
||
def test_infer_objects(self): | ||
# GH 11221 | ||
df = DataFrame({'a': ['a', 1, 2, 3], | ||
'b': ['b', 2.0, 3.0, 4.1], | ||
'c': ['c', datetime(2016, 1, 1), | ||
datetime(2016, 1, 2), | ||
datetime(2016, 1, 3)], | ||
'd': [1, 2, 3, 'd']}, | ||
columns=['a', 'b', 'c', 'd']) | ||
df = df.iloc[1:].infer_objects() | ||
|
||
assert df['a'].dtype == 'int64' | ||
assert df['b'].dtype == 'float64' | ||
assert df['c'].dtype == 'M8[ns]' | ||
assert df['d'].dtype == 'object' | ||
|
||
expected = DataFrame({'a': [1, 2, 3], | ||
'b': [2.0, 3.0, 4.1], | ||
'c': [datetime(2016, 1, 1), | ||
datetime(2016, 1, 2), | ||
datetime(2016, 1, 3)], | ||
'd': [2, 3, 'd']}, | ||
columns=['a', 'b', 'c', 'd']) | ||
# reconstruct frame to verify inference is same | ||
tm.assert_frame_equal(df.reset_index(drop=True), expected) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of this part of the test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This enforces that the inference rules I'm checking against are the same as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see. Perhaps a comment above |
||
|
||
def test_stale_cached_series_bug_473(self): | ||
|
||
# this is chained, but ok | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -268,3 +268,21 @@ def test_series_to_categorical(self): | |
expected = Series(['a', 'b', 'c'], dtype='category') | ||
|
||
tm.assert_series_equal(result, expected) | ||
|
||
def test_infer_objects_series(self): | ||
# GH 11221 | ||
actual = Series(np.array([1, 2, 3], dtype='O')).infer_objects() | ||
expected = Series([1, 2, 3]) | ||
tm.assert_series_equal(actual, expected) | ||
|
||
actual = Series(np.array([1, 2, 3, None], dtype='O')).infer_objects() | ||
expected = Series([1., 2., 3., np.nan]) | ||
tm.assert_series_equal(actual, expected) | ||
|
||
# only soft conversions, uncovertable pass thru unchanged | ||
actual = (Series(np.array([1, 2, 3, None, 'a'], dtype='O')) | ||
.infer_objects()) | ||
expected = Series([1, 2, 3, None, 'a']) | ||
|
||
assert actual.dtype == 'object' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps a comment above that block of code explaining why this is the case would be useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, why isn't there a similar test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: add newline between "expected =" and "assert " (for readability) |
||
tm.assert_series_equal(actual, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a separate ipython block (to break up the code)