Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

Closed
chrish42 opened this issue Aug 5, 2020 · 2 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request Enhancement

Comments

@chrish42
Copy link
Contributor

chrish42 commented Aug 5, 2020

I got bit recently again by the mixed types DtypeWarning while processing a CSV file. I assume that at some point, when StringDtype is not experimental anymore, read_csv() will use that and won't need object dtype anymore, and so this potential problem source will go away.

In the meantime though, would it be possible to have an option in read_csv() to use StringDtype instead of the object dtype? Both for early adopters and people who want to try it out... and it would also be a nice migration path for when StringDtype is ready. Then, it would only be a matter of flipping the default for this switch. And for people who need to revert to object dtype for some reason, that would provide them a way to do that too at that time. Thoughts? Or is it still too soon for even experimental usage of StringDtype in read_csv()?

I'd be willing to create a pull request for this (at last for the Python version of the CSV parser).

@chrish42 chrish42 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2020
@TomAugspurger
Copy link
Contributor

Thanks @chrish42, that's being discussed on #29752.

I think we've agreed on an API (use_nullable_dtypes=True) and are looking for someone to implement it.

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Aug 5, 2020
@chrish42
Copy link
Contributor Author

chrish42 commented Aug 5, 2020

Ugh. Apologies, @TomAugspurger. I did try to search for a duplicate issue, but I didn't find that one. Maybe because I'm focussed more on "don't use object dtype" instead of "use types that support NA well". But yes, this discussion should go there. I'll read through the (long-ish) existing issue, and see if I have something to add. Thanks!

@bashtage bashtage removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants