Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: fix corner case of lib.infer_dtype #23422

Merged
merged 10 commits into from
Nov 4, 2018
4 changes: 4 additions & 0 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ from tslibs.timezones cimport get_timezone, tz_compare

from missing cimport (checknull,
is_null_datetime64, is_null_timedelta64, is_null_period)
from missing import isnaobj


# constants that will be compared to potentially arbitrarily large
Expand Down Expand Up @@ -1171,6 +1172,9 @@ def infer_dtype(object value, bint skipna=False):
values = construct_1d_object_array_from_listlike(value)

values = getattr(values, 'values', values)
if skipna:
values = values[~isnaobj(values)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a python and not a cimport, why are you not using checknull?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checknull only returns a single bint, and not an array. I would have liked to cimport isnaobj, but that didn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this isn array, ok, then add isnaobj to missing.pxd and make it a cpdef. then you can cimport it. (and you need to type return value)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can incorporate this skipna logic into the for-loop below. Perhaps have an indicator to tell us whether we have seen an element in the values array that is non-null (when skipna is True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that's not directly possible (nor performant), because the line directly below (with _try_infer_map) will return prematurely as soon as it can grab hold of a dtype.

val = _try_infer_map(values)
if val is not None:
return val
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,22 @@ def test_unicode(self):
expected = 'unicode' if PY2 else 'string'
assert result == expected

@pytest.mark.parametrize('dtype, missing, skipna, expected', [
(float, np.nan, False, 'floating'),
(float, np.nan, True, 'floating'),
(object, np.nan, False, 'floating'),
(object, np.nan, True, 'empty'),
(object, None, False, 'mixed'),
(object, None, True, 'empty')
])
@pytest.mark.parametrize('box', [pd.Series, np.array])
def test_object_empty(self, box, missing, dtype, skipna, expected):
# GH 23421
arr = box([missing, missing], dtype=dtype)

result = lib.infer_dtype(arr, skipna=skipna)
jreback marked this conversation as resolved.
Show resolved Hide resolved
assert result == expected

def test_datetime(self):

dates = [datetime(2012, 1, x) for x in range(1, 20)]
Expand Down