Closed
Description
xref PR #17077
Now that a cache
keyword has been added to to_datetime
, ideally the default should be set to cache='infer'
which would inspect the input data to determine whether caching would be a more efficient conversion.
From some research (here and here), date strings, especially ones with timezones offsets, can benefit from conversion with a cache of dates. The rules of thumb of whether to convert with a cache should be based on a combination of input data type, proportion of duplicate values, and number of dates to convert.
Additionally, I'd be nice to resolve existing to_datetime
performance issues (e.g. #17410) just so the rules of thumb informing the inference step are not misguided by these issues.