-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Assigning extension array value to series of dtype object fails if element type is array-like #42437
Comments
Somewhere in there it is calling np.array on your EA, and that produces a 2D ndarray. If add to your example
then the code snippet works |
@jbrockmendel if I'm reading your comment correctly, there's a (currently undocumented) requirement for Pandas extension types to implement the NumPy callback |
Huh, I thought that was already there as the default implementation, but apparently not. PR would be welcome. |
The default implementation of
to the base class would raise |
moving to 1.3.4 |
changing milestone to 1.3.5 |
this is good to fix but not necessary for 1.3.x |
moving off 1.3.x milestone |
This works on main, could ue a test |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Output:
Problem description
If the user creates a Series of dtype
object
and attempts to set the value of that Series with an extension array, the block manager will first pass the extension array throughnp.asarray()
and then assign the block's values to the ndarray returned bynp.asarray()
(See code here).This logic assumes that
np.asarray()
will always return a 1D array. However,np.asarray()
is not guaranteed to return a 1D array; if the items of the argument tonp.asarray()
are array-like,np.asarray()
will iterate over them and generate an array with 2 or more dimensions. This 2- or-more-dimensional array can't be assigned to the series, and Pandas throws the error "ValueError: shape mismatch...".This problem is affecting the TensorArray extension type in Text Extensions for Pandas, because the elements of a TensorArray are tensors. The example code above shows a simpler case where the items of the extension array are Python tuples. In general, any item type that
np.asarray()
converts to an array of one or more dimensions will have this problem.Expected Output
The above code should fill
series2
with the individual objects at each of the positions in the extension array. In the case of the example code above, that means that each element ofseries2
should contain the Python tuple(1, 2, 3)
.Output of
pd.show_versions()
pandas : 1.3.0
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: