Closed
Description
When converting a string date column to datetime, if the string has a GMT timezone suffix (e.g. "-0800"), it takes 1000x longer to parse:
dates = pd.Series(pd.date_range('1/1/2000', periods=2000))
string_dates = dates.apply(lambda s: str(s))
tz_string_dates = string_dates.apply(lambda dt: dt + ' -0800')
%timeit pd.to_datetime(string_dates)
> 1000 loops, best of 3: 579 µs per loop
%timeit pd.to_datetime(tz_string_dates)
> 1 loops, best of 3: 562 ms per loop
Note microseconds vs milliseconds. 3 orders of magnitude... seems unnecessary. This can make loading CSVs into correctly-typed dataframes very, very, very slow for large datasets.