Skip to content

Better error for str.cat with listlike of wrong dtype. #26607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 14, 2019
Prev Previous commit
Next Next commit
Review (jreback & simonjayhawkins)
  • Loading branch information
h-vetinari committed Jun 12, 2019
commit 02f6429662ea83b481a2156549bff482c3fcf24a
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -607,7 +607,7 @@ Strings
^^^^^^^

- Bug in the ``__name__`` attribute of several methods of :class:`Series.str`, which were set incorrectly (:issue:`23551`)
- Improved error message when passing ``Series`` of wrong dtype to :meth:`Series.str.cat` (:issue:`22722`)
- Improved error message when passing :class:`Series` of wrong dtype to :meth:`Series.str.cat` (:issue:`22722`)
-


Expand Down
32 changes: 23 additions & 9 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,33 @@ def cat_safe(list_of_columns, sep):
Auxiliary function for :meth:`str.cat`.

Same signature as cat_core, but handles TypeErrors in concatenation, which
happen if the Series in list_of columns have the wrong dtypes or content.
happen if the arrays in list_of columns have the wrong dtypes or content.

Parameters
----------
list_of_columns : list of numpy arrays
List of arrays to be concatenated with sep;
these arrays may not contain NaNs!
sep : string
The separator string for concatenating the columns

Returns
-------
nd.array
The concatenation of list_of_columns with sep
"""
# if there are any non-string values (wrong dtype or hidden behind object
# dtype), np.sum will fail; catch error and return with better message
try:
result = cat_core(list_of_columns, sep)
except TypeError:
dtypes = [lib.infer_dtype(x, skipna=True) for x in list_of_columns]
illegal = [x not in ('string', 'empty') for x in dtypes]
first_offender = [x for x, y in zip(list_of_columns, illegal) if y][0]
raise TypeError('Concatenation requires list-likes containing only '
'strings (or missing values). Offending values found '
'in column {}'.format(first_offender)) from None
# if there are any non-string values (wrong dtype or hidden behind
# object dtype), np.sum will fail; catch and return with better message
for column in list_of_columns:
dtype = lib.infer_dtype(column, skipna=True)
if dtype not in ['string', 'empty']:
raise TypeError(
'Concatenation requires list-likes containing only '
'strings (or missing values). Offending values found in '
'column {}'.format(dtype)) from None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having a raise outside the for loop to ensure we don't slip through, or is that not going to happen?

return result


Expand Down