-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-114058: The Tier2 Optimizer #114059
gh-114058: The Tier2 Optimizer #114059
Conversation
Please give me some time to write out the proper docs explaining the abstract IR this uses. |
All tests run with uops on now passes except for the following two:
|
Maybe we should just fix the test? This sort of seems to be kicking the can down the road, since eventually tier 2 will be on by default. There's a |
1% slower on macOS (other platforms aren't building right now). 8% reduction in traces executed, but 3% increase in uops executed. PGO failure on Windows:
PGO failure on Linux:
Both look like recursion-related issues. The Windows one may be fixed on |
Thanks Brandt. Seems like the slowdown is due to Llike Mark said though, benchmark results aren't too important at the moment. Function inlining would be the most important optimization and that's missing from this PR, to be added in a future PR. |
Nevertheless it would be wise to dig deeper into what goes on with bm_nbody. It may be an important canary. :-) |
This reverts commit 2096a8a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments on the latest version which I (mostly) wrote before hearing about plans to merge with Mark's version. (I wonder if it might make sense to start a new PR for that, rather than carry the history of over 100 commits along?)
if var.name != MANGLED_NULL: | ||
out.emit(f"{var.name} = sym_init_unknown(ctx);\n") | ||
out.emit(f"if({var.name} == NULL) goto error;\n") | ||
if var.type_prop: | ||
out.emit(f"sym_set_type({var.name}, {var.type_prop[0]}, 0);\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of something that I think can be streamlined further. Instead of calling sym_init_unknown(ctx)
followed by sym_set_type(...)
there ought to be a single function you can call to create a symbol with a given type. Basically a bunch of convenience constructors.
// This the above + additional working space we need. | ||
#define UOP_MAX_TRACE_WORKING_LENGTH (UOP_MAX_TRACE_LENGTH * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this used in the code any more.
if (tp->tp_version_tag != 0) { | ||
sym_set_type(sym, GUARD_TYPE_VERSION_TYPE, tp->tp_version_tag); | ||
} | ||
if (tp->tp_flags & Py_TPFLAGS_MANAGED_DICT) { | ||
PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(obj); | ||
if(_PyDictOrValues_IsValues(dorv)) { | ||
sym_set_type(sym, GUARD_DORV_VALUES_TYPE, 0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also use a longer explanation. What special cases are these two if
blocks covering?
Makes sense. I will close this PR and create a new one with the Mark's new DSL changes. Thanks for all your reviews! |
The successor PR with cleaned up history is at #115085. |
Closes #114058
This PR turns on the optimizer for all uops. The tier 2 uops optimizer contains a few parts: the abstract interpreter, the IR, and the codegen.
The abstract interpreter does the following:
Function inlining is left out for a future PR, as it's the most complex.
After analysis of the bytecode and doing all of the above, it emits optimized uops, and passes that to the executor.
When uops is enabled, **this passes the entire CPython test suite **. The significant milestone is that this is able to analyse and abstract interpret all CPython uops that we currently support. The other significant milestone is that this generates code that passes CPython's test suite.
Refleak tests will fail as well, as they need a design overhaul.
The design of this PR is here https://github.com/Fidget-Spinner/cpython_optimization_notes/blob/main/3.13/uops_optimizer.md
High level discussion here faster-cpython/ideas#648.
0-2% faster on Linux, 3% faster on macOS ARM64