Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: fix corner case of lib.infer_dtype #23422

Merged
merged 10 commits into from
Nov 4, 2018
5 changes: 4 additions & 1 deletion pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ from tslibs.conversion cimport convert_to_tsobject
from tslibs.timedeltas cimport convert_to_timedelta64
from tslibs.timezones cimport get_timezone, tz_compare

from missing cimport (checknull,
from missing cimport (checknull, isnaobj,
is_null_datetime64, is_null_timedelta64, is_null_period)


Expand Down Expand Up @@ -1177,6 +1177,9 @@ def infer_dtype(object value, bint skipna=False):
values = construct_1d_object_array_from_listlike(value)

values = getattr(values, 'values', values)
if skipna:
values = values[~isnaobj(values)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a python and not a cimport, why are you not using checknull?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checknull only returns a single bint, and not an array. I would have liked to cimport isnaobj, but that didn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this isn array, ok, then add isnaobj to missing.pxd and make it a cpdef. then you can cimport it. (and you need to type return value)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can incorporate this skipna logic into the for-loop below. Perhaps have an indicator to tell us whether we have seen an element in the values array that is non-null (when skipna is True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that's not directly possible (nor performant), because the line directly below (with _try_infer_map) will return prematurely as soon as it can grab hold of a dtype.

val = _try_infer_map(values)
if val is not None:
return val
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/missing.pxd
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
# -*- coding: utf-8 -*-

from numpy cimport ndarray, uint8_t

from tslibs.nattype cimport is_null_datetimelike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you added this back in rebase. pls remove


cpdef bint checknull(object val)
cpdef bint checknull_old(object val)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no extra blank lines

cpdef ndarray[uint8_t] isnaobj(ndarray arr)

cdef bint is_null_datetime64(v)
cdef bint is_null_timedelta64(v)
cdef bint is_null_period(v)
2 changes: 1 addition & 1 deletion pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ cdef inline bint _check_none_nan_inf_neginf(object val):

@cython.wraparound(False)
@cython.boundscheck(False)
def isnaobj(ndarray arr):
cpdef ndarray[uint8_t] isnaobj(ndarray arr):
"""
Return boolean mask denoting which elements of a 1-D array are na-like,
according to the criteria defined in `_check_all_nulls`:
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,22 @@ def test_unicode(self):
expected = 'unicode' if PY2 else 'string'
assert result == expected

@pytest.mark.parametrize('dtype, missing, skipna, expected', [
(float, np.nan, False, 'floating'),
(float, np.nan, True, 'floating'),
(object, np.nan, False, 'floating'),
(object, np.nan, True, 'empty'),
(object, None, False, 'mixed'),
(object, None, True, 'empty')
])
@pytest.mark.parametrize('box', [pd.Series, np.array])
def test_object_empty(self, box, missing, dtype, skipna, expected):
# GH 23421
arr = box([missing, missing], dtype=dtype)

result = lib.infer_dtype(arr, skipna=skipna)
jreback marked this conversation as resolved.
Show resolved Hide resolved
assert result == expected

def test_datetime(self):

dates = [datetime(2012, 1, x) for x in range(1, 20)]
Expand Down