Skip to content

ARROW-11247: [C++] Infer date32 columns in CSV#9203

Closed
nealrichardson wants to merge 4 commits intoapache:masterfrom
nealrichardson:csv-infer-date
Closed

ARROW-11247: [C++] Infer date32 columns in CSV#9203
nealrichardson wants to merge 4 commits intoapache:masterfrom
nealrichardson:csv-infer-date

Conversation

@nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Jan 13, 2021

No description provided.

@github-actions
Copy link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My editor runs clang-format and IDK why it moved this

@nealrichardson
Copy link
Member Author

Some python tests are failing with this change; I think the tests should be updated from where it assumes dates will be parsed (suboptimally IMO) as timestamps, but @jorisvandenbossche maybe you can review and weigh in.

@jorisvandenbossche
Copy link
Member

I added a commit that updates the tests for the new behaviour, in case we decide we are OK with that.

Generally I think we should do the best inference from Arrow's point of view, and which is thus date type for a date string.

The only reason I am thinking to not do it is that, for people converting the data to pandas afterwards, dates are not that well supported (at this point) in pandas. Now, there is a to_pandas(..., date_as_object=False) keyword a user can specify to still get a datetime64 dtype in pandas instead of datetime.date objects.

@pitrou
Copy link
Member

pitrou commented Jan 14, 2021

@pitrou
Copy link
Member

pitrou commented Jan 14, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants