Skip to content

ENH/PERF: Add cache='infer' to to_datetime #18255

Closed
@mroeschke

Description

@mroeschke

xref PR #17077

Now that a cache keyword has been added to to_datetime, ideally the default should be set to cache='infer' which would inspect the input data to determine whether caching would be a more efficient conversion.

From some research (here and here), date strings, especially ones with timezones offsets, can benefit from conversion with a cache of dates. The rules of thumb of whether to convert with a cache should be based on a combination of input data type, proportion of duplicate values, and number of dates to convert.

Additionally, I'd be nice to resolve existing to_datetime performance issues (e.g. #17410) just so the rules of thumb informing the inference step are not misguided by these issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions