Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: DataFrame.from_dict doesn't work with collections.UserDict objects #59737

Open
2 of 3 tasks
mesvam opened this issue Sep 6, 2024 · 6 comments
Open
2 of 3 tasks
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action

Comments

@mesvam
Copy link

mesvam commented Sep 6, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from collections import UserDict
class CustomDict(UserDict):
    pass
cd = CustomDict(col_1=[1,2], col_2=[3,4])
pd.DataFrame.from_dict(cd)

output:

       0
0  col_1
1  col_2

Issue Description

Pandas will not accept UserDict and other custom dict-like objects for DataFrame creation. Subclassing dict instead of UserDict is a workaround for this example, but for a variety of complicated reasons (examples: 1, 2, 3) it is sometimes undesirable to subclass dict.

Expected Behavior

The output should be the same as

pd.DataFrame.from_dict(dict(cd))

output

   col_1  col_2
0      1      3
1      2      4

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22000
machine : AMD64
processor : Intel64 Family 6 Model 151 Stepping 5, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 72.1.0
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.2.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.25.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : 0.60.0
numexpr : 2.8.7
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : 0.22.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

@mesvam mesvam added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 6, 2024
@rhshadrach
Copy link
Member

Thanks for the report. I agree that this is unintuitive, but that is unfortunately because UserDict is not a subclass of dict. Because of this it appears to me that pandas is behaving in the way it is documented: cd is not a dict, so pandas treats it as the iterable that it is creating a single-column DataFrame. So I've reworked this issue as a feature request.

I'm supportive of treating UserDict as a dict as proposed here.

@rhshadrach rhshadrach added Enhancement Needs Discussion Requires discussion from core team before further action Constructors Series/DataFrame/Index/pd.array Constructors and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Sep 7, 2024
@rhshadrach rhshadrach changed the title BUG: DataFrame.from_dict doesn't work with collections.UserDict objects ENH: DataFrame.from_dict doesn't work with collections.UserDict objects Sep 7, 2024
@rhshadrach
Copy link
Member

As a workaround, you can always pass cd.data instead.

@mesvam
Copy link
Author

mesvam commented Sep 7, 2024

Thanks.

I searched around and found a similar issue here #34257, so I don't know if they're related and it's a more general issue with pandas or just pandas.DataFrame.from_dict specifically

Also, I can kind of understand if pd.DataFrame(dict_like) didn't work, but pd.DataFrame.from_dict(dict_like) feels like it should work with anything that inherits from collections.abc.Mapping

@rhshadrach rhshadrach added the IO Data IO issues that don't fit into a more specific label label Sep 7, 2024
@rhshadrach
Copy link
Member

Also, I can kind of understand if pd.DataFrame(dict_like) didn't work...

Why is that?

but pd.DataFrame.from_dict(dict_like) feels like it should work with anything that inherits from collections.abc.Mapping

Thanks - I did initially miss that the OP was about from_dict and not the constructor. Agreed here as well.

@mesvam
Copy link
Author

mesvam commented Sep 7, 2024

Also, I can kind of understand if pd.DataFrame(dict_like) didn't work...

Why is that?

I think it would be ideal if it worked with pd.DataFrame as well as pd.DataFrame.from_dict, but my intuition was that since pd.DataFrame takes in a variety of formats, I sort of assumed it would be easier to for it to get confused by a non-standard input type. Like I could probably create an weird class that can be accessed both like an array and as a dict, and I wouldn't really expect pd.DataFrame to be able to correctly guess how I want it to access the object.

Whereas from_dict should sort of implicitly assume that the input is a dict-like object and parse it as such, even if it's not the exact type it was expecting.

@thyripian
Copy link

Could I take a go at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Enhancement IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants