Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use the newly added support for datetime64[ms]
directly in pandas when opening a file.
Feature Description
import pandas as pd
import io
data = """\
index,date
A,"0004-04-04T12:30"
B,"2004-04-04T12:30"
C,"3004-04-04T12:30"
D,""
"""
df = pd.read_csv(io.StringIO(data), parse_dates=['date'])
df
which would return:
index date
0 A 4-04-04 12:30:00
1 B 2004-04-04 12:30:00
2 C 3004-04-04 12:30:00
3 D NaT
Or alternatively, it could also be more explicit:
df = pd.read_csv(io.StringIO(data))
df['date'] = pd.to_datetime(df['date'], reso='ms')
Alternative Solutions
Currently, the best way I have found to read in a CSV that has entries outside the 1677-2242 range is:
df = pd.read_csv(io.StringIO(data))
df['date'] = df['date'].fillna("").to_numpy().astype('datetime64[ms]')
Thus, I am letting numpy to the actual date parsing. fillna
is needed because the empty cell in data
gets translated to np.nan
by read_csv
and numpy can't cast that as a datetime64
. (I expected a NaT
but I guess that's another issue anyway).
This solution requires that the dates be in ISO8601 format, which is much stricter than to_datetime
.
Additional Context
See my S/O question for more alternative solutions: https://stackoverflow.com/questions/76608166/how-do-i-parse-a-list-of-datetimes-with-a-s-resolution-in-pandas-2
Thanks to ignoring-gravity for the answer, which I reused here.