Skip to content

to_datetime 1000x slower for timezone-aware strings vs timezone-agnostic #9714

Closed
@wetchler

Description

@wetchler

When converting a string date column to datetime, if the string has a GMT timezone suffix (e.g. "-0800"), it takes 1000x longer to parse:

dates = pd.Series(pd.date_range('1/1/2000', periods=2000))
string_dates = dates.apply(lambda s: str(s))
tz_string_dates = string_dates.apply(lambda dt: dt + ' -0800')

%timeit pd.to_datetime(string_dates)
> 1000 loops, best of 3: 579 µs per loop
%timeit pd.to_datetime(tz_string_dates)
> 1 loops, best of 3: 562 ms per loop

Note microseconds vs milliseconds. 3 orders of magnitude... seems unnecessary. This can make loading CSVs into correctly-typed dataframes very, very, very slow for large datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions