Closed
Description
This is similar to #9697, which was fixed in 0.16.1. I give a (very) slightly modified example here to show some related behavior which is at least inconsistent and should probably be handled cleanly.
It's not entirely clear to me what the desired behavior is in this case; it's possible that transform should not work here at all, since it spits out unexpected values. But at minimum it seems like it should do the same thing no matter how I invoke it below.
Example:
import numpy as np
df = pd.DataFrame({'col1':[1,1,2,2], 'col2':[1,2,3,np.nan])
#Let's try grouping on 'col2', which contains a NaN.
# Works and gives arguably reasonable results, with one unpredictable value
df.groupby('col2').transform(sum)['col1']
# Throws an unhelpful error
df.groupby('col2')['col1'].transform(sum)
Error is similar to the one encountered in the previous issue:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-2d4b83df6487> in <module>()
----> 1 df.groupby('col2')['col1'].transform(sum)
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
2442 cyfunc = _intercept_cython(func)
2443 if cyfunc and not args and not kwargs:
-> 2444 return self._transform_fast(cyfunc)
2445
2446 # reg transform
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _transform_fast(self, func)
2488 values = self._try_cast(values, self._selected_obj)
2489
-> 2490 return self._set_result_index_ordered(Series(values))
2491
2492 def filter(self, func, dropna=True, *args, **kwargs):
/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _set_result_index_ordered(self, result)
503 result = result.sort_index()
504
--> 505 result.index = self.obj.index
506 return result
507
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in __setattr__(self, name, value)
2159 try:
2160 object.__getattribute__(self, name)
-> 2161 return object.__setattr__(self, name, value)
2162 except AttributeError:
2163 pass
/usr/local/lib/python2.7/dist-packages/pandas/lib.so in pandas.lib.AxisProperty.__set__ (pandas/lib.c:42548)()
/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in _set_axis(self, axis, labels, fastpath)
273 object.__setattr__(self, '_index', labels)
274 if not fastpath:
--> 275 self._data.set_axis(axis, labels)
276
277 def _set_subtyp(self, is_all_dates):
/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in set_axis(self, axis, new_labels)
2217 if new_len != old_len:
2218 raise ValueError('Length mismatch: Expected axis has %d elements, '
-> 2219 'new values have %d elements' % (old_len, new_len))
2220
2221 self.axes[axis] = new_labels
ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements