-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-42123: Run the parser two times and only enable invalid rules on the second run #22111
Conversation
The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.
Two more things:
|
I think one of the requirements for Guido's approach to work is that the memo cache is preserved between restarts of the parser.
I can benchmark this tomorrow as I am currently repairing my main PC (watercooling leak 😱 ) and is not operational. |
Oooh, that's correct. Ok, let me have another look at this then. |
@pablogsal Do you have the bandwidth for another review on this? If you feel it's okay, I'm going to add an issue and a NEWS blurb. |
Will check this if I can today or tomorrow |
After some benchmarking, it results that the parser with this PR is up to 10% (in xxl benchmark) and ~3% faster in parsing the stdlib. Let's merge it :) |
@ambv Do we backport this to 3.9? It's very low-risk and gives a sizable performance boost. |
"Do we backport this to 3.9?" We almost never backport performance patches. Low risk is not no risk. With the yearly releases, there is less than a year to wait. With the early and repeated alphas, bleeding edge fans can get this soon enough. |
OTOH, for the PEG parser we've been backporting everything relentlessly, otherwise when we fix a bug in 3.10 the backport is a pain in the neck. |
Ok, since there's no consensus, let's not backport this. |
I intentionally added 'almost' to my previous comment to suggest that carefully considered exceptions are possible. I forgot about the 3.9 escape option to run the old parser. Guido's comment, when merge conflicts really happen, is real to me from my experience with IDLE. We (I) 'relentlessly' backport everything, but have not always. Manual backports are an extra risk. So is not backporting a fix because it is too much trouble. I believe PP was once tested on a large number of other python codebases other than the stdlib. Is that done routinely, and in particular for this patch (before and after)? That would make a difference to me. |
I think @pablogsal benchmarked this against something like the 10 most downloaded PyPI packages, although I'm not sure. |
Yup, I did the benchmark again the 15 most downloaded packages. |
FWIW I think it's worth it to backport, especially now that #23006 was merged into 3.9. |
I would be supportive of the backport for the reasons previously mentioned |
Agreed. |
…es on the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)
…es on the second run (GH-22111) (GH-23011) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced. (cherry picked from commit bca7014)
…lots1 * origin/master: (365 commits) bpo-42029: Remove IRIX code (pythonGH-23023) bpo-42143: Ensure PyFunction_NewWithQualName() can't fail after creating the func object (pythonGH-22953) bpo-34204: Use pickle.DEFAULT_PROTOCOL in shelve (pythonGH-19639) bpo-41805: Documentation for PEP 585 (pythonGH-22615) bpo-42161: Micro-optimize _collections._count_elements() (pythonGH-23008) bpo-42161: Remove private _PyLong_Zero and _PyLong_One (pythonGH-23003) bpo-42099: Fix reference to ob_type in unionobject.c and ceval (pythonGH-22829) bpo-41659: Disallow curly brace directly after primary (pythonGH-22996) bpo-6761: Enhance __call__ documentation (pythonGH-7987) bpo-42161: Modules/ uses _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22998) bpo-41474, Makefile: Add dependency on cpython/frameobject.h (pythonGH-22999) bpo-42157: Rename unicodedata.ucnhash_CAPI (pythonGH-22994) bpo-42161: Use _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22995) bpo-30681: Support invalid date format or value in email Date header (pythonGH-22090) bpo-42161: Add _PyLong_GetZero() and _PyLong_GetOne() (pythonGH-22993) bpo-42123: Run the parser two times and only enable invalid rules on the second run (pythonGH-22111) bpo-42157: Convert unicodedata.UCD to heap type (pythonGH-22991) bpo-42157: unicodedata avoids references to UCD_Type (pythonGH-22990) bpo-39101: Fixes BaseException hang in IsolatedAsyncioTestCase. (pythonGH-22654) bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (pythonGH-22713) ...
…the second run (pythonGH-22111) * Implement running the parser a second time for the errors messages The first parser run is only responsible for detecting whether there is a `SyntaxError` or not. If there isn't the AST gets returned. Otherwise, the parser is run a second time with all the `invalid_*` rules enabled so that all the customized error messages get produced.
Here's the easiest way I could think of to do this. Are we okay with something
as simple as this or would we like to investigate further whether we could
actually use the memo cache wherever possible?
The first parser run is only responsible for detecting whether
there is a
SyntaxError
or not. If there isn't, the AST gets returned.Otherwise, the parser is run a second time with all the
invalid_*
rules enabled so that all the customized error messages get produced.
Regarding bpo-41659, we can now implement a very simple fix:
With this patch, I'm now getting:
Thoughts?
https://bugs.python.org/issue42123