-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
GH-115506: Improve handling of constants in tier two #124809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eb54546
to
cc0d30f
Compare
I think this belongs in the (yet to be created) partial evaluation pass. Any optimization that does evaluation, or transforms representation belongs in that pass. In that pass, the op(_BINARY_OP_MULTIPLY_FLOAT, (left, right -- res)) {
if (is_const(left) && is_const(right)) {
assert(PyFloat_CheckExact(get_const(left)));
assert(PyFloat_CheckExact(get_const(right)));
PyObject *temp = PyFloat_FromDouble(
PyFloat_AS_DOUBLE(sym_get_const(left)) *
PyFloat_AS_DOUBLE(sym_get_const(right)));
res = virtual_const(temp, ctx);
}
else {
materialize(left, ctx);
materialize(right, ctx);
emit(_BINARY_OP_MULTIPLY_FLOAT, ctx);
res = concrete_tos(ctx);
}
} |
I think we could extract the useful part (the API to maintain a constant pool in the executor) though, and make the PR just that. |
Well, I think the rest of the PR is useful too... 🙂 I'm not sure I understand the desire to have an entirely new pass when the same work could be easily done as part of this one. It seems like a lot of extra code (and computation), for little gain. I also suspect that we would be repeating a ton of the same work in the second pass. Is there a paper or something that explains why two passes are better than one? My intuition would be to reduce the number of passes, rather than increasing it (for example, by doing abstract interpretation during trace projection). |
Actually the opposite, the literature suggests combining passes produces better code, not the latter 😉. I think we just want a clean separation of concerns though. |
This can wait until the mythical second pass is implemented. |
This adds a
refs
tuple to executor objects, which contains constants created during optimization of tier two traces. This isn't deduplicated at all, since not all of our "constants" are actually "constant", and arbitrary hashes/comparisons can open up a whole can of worms. If we want to go that route, we can probably re-use_PyCode_ConstantKey
for known safe, immutable types, and compare everything else by identity.This also updates some parts of the optimizer to improve the handling of known constants (such as adding peepholing for
_POP_TOP_LOAD_CONST_INLINE_BORROW
,_REPLACE_WITH_TRUE
, and_COPY
/_LOAD_FAST
with known constant values).Performance and memory are in the noise... perhaps a bit faster if you squint hard enough. But it's working: the stats show lots of instructions like
_REPLACE_WITH_TRUE
,_POP_TOP_LOAD_CONST_INLINE_BORROW
,_LOAD_FAST
,_COPY
, and_BINARY_OP_ADD_INT
being replaced with simpler instructions like_LOAD_CONST_INLINE_BORROW
,_POP_TOP
, and_LOAD_CONST_INLINE
.My next step will be experimenting with no-refcount variants of
_COPY
,_LOAD_FAST
,_STORE_FAST
, and_POP_TOP
with known immortal values.