Skip to content

PERF: Parse certain dates in Cython instead of falling back to dateutil.parse #25922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 20, 2019
Merged
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4222fd5
Add new benchmarks for parsing datetime strings
anmyachev Mar 26, 2019
78254a4
Implement parsing dd/mm/yyyy and mm/dd/yyyy in Cython
vnlitvinov Mar 28, 2019
1608090
fix code style
anmyachev Mar 29, 2019
eec3beb
using DEF statement for compile-time constant
anmyachev Mar 29, 2019
d322b8d
parse_slashed_date simplification
anmyachev Mar 29, 2019
0546e0a
removed micro-bench
anmyachev Mar 29, 2019
4a673ff
Support mm-yyyy along with mm-dd-yyyy
vnlitvinov Mar 29, 2019
23df426
Rename parse_slashed_date to parse_delimited_date
vnlitvinov Mar 29, 2019
3538566
Speed up parse_datetime_string_with_reso
vnlitvinov Mar 29, 2019
4d4df11
fix code style
anmyachev Mar 29, 2019
504de84
Move to datetime_new, add docstring to _parse_delimited_date
vnlitvinov Apr 1, 2019
0613e66
Add whatsnew entry
vnlitvinov Apr 1, 2019
b985e37
fix parsing MM/YYYY for MM > 12
anmyachev Apr 2, 2019
f2843e1
added tests for parse_delimited_date
anmyachev Apr 2, 2019
4f66004
fix flake8 bugs in test_parse_dates.py
anmyachev Apr 2, 2019
ac6e348
Fix date parsing for Python <= 3.6.0
vnlitvinov Apr 3, 2019
5384ebe
removed parsing MM.YYYY format, because, for example, 10.2019 interpr…
anmyachev Apr 3, 2019
889ef7a
Remove whatsnew entry for the change
vnlitvinov Apr 4, 2019
a6926e7
Remove duplicate parsing of MM-YYYY in _parse_dateabbr_string
vnlitvinov Apr 4, 2019
b7cd6b1
added some comments in _parse_delimited_date
anmyachev Apr 5, 2019
4a2929d
fix docstring in _parse_delimited_date
anmyachev Apr 5, 2019
4bc1821
fix bug when parsing 01/12/2019 with dayfirst==True
anmyachev Apr 8, 2019
a43fa7b
first attemp to use hypothesis in tests
anmyachev Apr 8, 2019
710a287
apply isort on pandas/tests/io/parser/test_parse_dates.py
anmyachev Apr 8, 2019
859e312
added new '%Y %m %d' format and 2 @pytest.mark.parametrize for test_h…
anmyachev Apr 8, 2019
b41ea63
removed test_parse_delimited_date; added next formats: '%y %m %d', '%…
anmyachev Apr 9, 2019
6fad4f4
added message for pytest.skip(); more complete docstring in _parse_de…
anmyachev Apr 9, 2019
7113c75
removed \ delimiter
anmyachev Apr 9, 2019
d0bfd91
using is_platform_windows() in date_strategy definition; changed date…
anmyachev Apr 9, 2019
da845ed
fixed import order; using @settings(deadline=None) now; dates with ye…
anmyachev Apr 9, 2019
13717ec
removed extra 'parse' import
anmyachev Apr 18, 2019
2cd971a
_is_not_delimiter is inline now
anmyachev Apr 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Speed up parse_datetime_string_with_reso
  • Loading branch information
vnlitvinov authored and anmyachev committed Apr 19, 2019
commit 3538566cbd1eed778290f9ea11d900ca7468d23c
18 changes: 12 additions & 6 deletions pandas/_libs/tslibs/parsing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -81,28 +81,30 @@ cdef inline object parse_delimited_date(object date_string, bint dayfirst,
buf = get_c_string_buf_and_size(date_string, &length)
if length == 10:
if _is_not_delimiter(buf[2]) or _is_not_delimiter(buf[5]):
return None
return None, None
month = _parse_2digit(buf)
day = _parse_2digit(buf + 3)
year = _parse_4digit(buf + 6)
reso = 'day'
elif length == 7:
if _is_not_delimiter(buf[2]):
return None
return None, None
month = _parse_2digit(buf)
year = _parse_4digit(buf + 3)
reso = 'month'
else:
return None
return None, None

if month < 0 or day < 0 or year < 0:
# some part is not an integer, so it's not a mm/dd/yyyy date
return None
return None, None

if 1 <= month <= MAX_DAYS_IN_MONTH and 1 <= day <= MAX_DAYS_IN_MONTH \
and (month <= MAX_MONTH or day <= MAX_MONTH):
if month > MAX_MONTH or (day < MAX_MONTH and dayfirst):
day, month = month, day
return PyDateTimeAPI.DateTime_FromDateAndTime(year, month, day,
0, 0, 0, 0, tzinfo, PyDateTimeAPI.DateTimeType)
0, 0, 0, 0, tzinfo, PyDateTimeAPI.DateTimeType), reso

raise DateParseError("Invalid date specified (%d/%d)" %
(month, day))
Expand Down Expand Up @@ -131,7 +133,7 @@ def parse_datetime_string(date_string, freq=None, dayfirst=False,
yearfirst=yearfirst, **kwargs)
return dt

dt = parse_delimited_date(date_string, dayfirst, _DEFAULT_TZINFO)
dt, _ = parse_delimited_date(date_string, dayfirst, _DEFAULT_TZINFO)
if dt is not None:
return dt

Expand Down Expand Up @@ -215,6 +217,10 @@ cdef parse_datetime_string_with_reso(date_string, freq=None, dayfirst=False,
if not _does_string_look_like_datetime(date_string):
raise ValueError('Given date string not likely a datetime.')

parsed, reso = parse_delimited_date(date_string, dayfirst, _DEFAULT_TZINFO)
if parsed is not None:
return parsed, parsed, reso

try:
return _parse_dateabbr_string(date_string, _DEFAULT_DATETIME, freq)
except DateParseError:
Expand Down