ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

chrish42 · 2020-08-05T19:38:07Z

I got bit recently again by the mixed types DtypeWarning while processing a CSV file. I assume that at some point, when StringDtype is not experimental anymore, read_csv() will use that and won't need object dtype anymore, and so this potential problem source will go away.

In the meantime though, would it be possible to have an option in read_csv() to use StringDtype instead of the object dtype? Both for early adopters and people who want to try it out... and it would also be a nice migration path for when StringDtype is ready. Then, it would only be a matter of flipping the default for this switch. And for people who need to revert to object dtype for some reason, that would provide them a way to do that too at that time. Thoughts? Or is it still too soon for even experimental usage of StringDtype in read_csv()?

I'd be willing to create a pull request for this (at last for the Python version of the CSV parser).

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-08-05T19:42:43Z

Thanks @chrish42, that's being discussed on #29752.

I think we've agreed on an API (use_nullable_dtypes=True) and are looking for someone to implement it.

chrish42 · 2020-08-05T20:01:23Z

Ugh. Apologies, @TomAugspurger. I did try to search for a duplicate issue, but I didn't find that one. Maybe because I'm focussed more on "don't use object dtype" instead of "use types that support NA well". But yes, this discussion should go there. I'll read through the (long-ish) existing issue, and see if I have something to add. Thanks!

chrish42 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2020

TomAugspurger closed this as completed Aug 5, 2020

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Aug 5, 2020

chrish42 mentioned this issue Aug 5, 2020

ENH: Allow opting in to new dtypes on I/O routines via keyword to I/O routines #29752

Closed

bashtage removed the Needs Triage Issue that has not been reviewed by a pandas team member label Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

chrish42 commented Aug 5, 2020

TomAugspurger commented Aug 5, 2020

chrish42 commented Aug 5, 2020

ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

ENH: way to opt-in to StringDtype instead of object dtype in read_csv() #35576

Comments

chrish42 commented Aug 5, 2020

TomAugspurger commented Aug 5, 2020

chrish42 commented Aug 5, 2020