Skip to content

Commit

Permalink
BUG: Fix dtype=str converts NaN to 'n' (pandas-dev#22564)
Browse files Browse the repository at this point in the history
More specifically the cases that seem to have an issue
are when:
- the series in empty
- it's a single element series

* Closes pandas-dev#22477
  • Loading branch information
Nikoleta-v3 authored and jorisvandenbossche committed Nov 20, 2018
1 parent 1520047 commit f0b2ff3
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 6 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1443,6 +1443,7 @@ Reshaping
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)

.. _whatsnew_0240.bug_fixes.sparse:

Expand Down
15 changes: 10 additions & 5 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from pandas._libs import lib, tslib, tslibs
from pandas._libs.tslibs import OutOfBoundsDatetime, Period, iNaT
from pandas.compat import PY3, string_types, text_type
from pandas.compat import PY3, string_types, text_type, to_str

from .common import (
_INT64_DTYPE, _NS_DTYPE, _POSSIBLY_CAST_DTYPES, _TD_DTYPE, _string_dtypes,
Expand Down Expand Up @@ -1216,11 +1216,16 @@ def construct_1d_arraylike_from_scalar(value, length, dtype):
if not isinstance(dtype, (np.dtype, type(np.dtype))):
dtype = dtype.dtype

# coerce if we have nan for an integer dtype
# GH 22858: only cast to float if an index
# (passed here as length) is specified
if length and is_integer_dtype(dtype) and isna(value):
dtype = np.float64
# coerce if we have nan for an integer dtype
dtype = np.dtype('float64')
elif isinstance(dtype, np.dtype) and dtype.kind in ("U", "S"):
# we need to coerce to object dtype to avoid
# to allow numpy to take our string as a scalar value
dtype = object
if not isna(value):
value = to_str(value)

subarr = np.empty(length, dtype=dtype)
subarr.fill(value)

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ def is_datetime64_dtype(arr_or_dtype):
return False
try:
tipo = _get_dtype_type(arr_or_dtype)
except TypeError:
except (TypeError, UnicodeEncodeError):
return False
return issubclass(tipo, np.datetime64)

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,17 @@ def test_constructor_no_data_index_order(self):
result = pd.Series(index=['b', 'a', 'c'])
assert result.index.tolist() == ['b', 'a', 'c']

def test_constructor_no_data_string_type(self):
# GH 22477
result = pd.Series(index=[1], dtype=str)
assert np.isnan(result.iloc[0])

@pytest.mark.parametrize('item', ['entry', 'ѐ', 13])
def test_constructor_string_element_string_type(self, item):
# GH 22477
result = pd.Series(item, index=[1], dtype=str)
assert result.iloc[0] == str(item)

def test_constructor_dtype_str_na_values(self, string_dtype):
# https://github.com/pandas-dev/pandas/issues/21083
ser = Series(['x', None], dtype=string_dtype)
Expand Down

0 comments on commit f0b2ff3

Please sign in to comment.