-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Open
Labels
extension-modulesC modules in the Modules dirC modules in the Modules dirperformancePerformance or resource usagePerformance or resource usagestdlibPython modules in the Lib dirPython modules in the Lib dirtype-featureA feature request or enhancementA feature request or enhancement
Description
Bug report
Bug description:
The state machine:
Line 726 in bbcb75c
parse_process_char(ReaderObj *self, _csvstate *module_state, Py_UCS4 c) |
is called for every character processed by csv.reader
:
Lines 969 to 974 in bbcb75c
while (linelen--) { | |
c = PyUnicode_READ(kind, data, pos); | |
if (parse_process_char(self, module_state, c) < 0) { | |
Py_DECREF(lineobj); | |
goto err; | |
} |
Even putting aside sophisticated SIMD or branching optimizations, it could be more efficient.
Most time is likely to be spent in a field (IN_FIELD
, IN_QUOTED_FIELD
). It's more efficient to find interesting characters (ie: escapes, quotes) and just copy the whole slice in between.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
extension-modulesC modules in the Modules dirC modules in the Modules dirperformancePerformance or resource usagePerformance or resource usagestdlibPython modules in the Lib dirPython modules in the Lib dirtype-featureA feature request or enhancementA feature request or enhancement
Projects
Status
No status