-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-98686: Get rid of "adaptive" and "quick" instructions #99182
GH-98686: Get rid of "adaptive" and "quick" instructions #99182
Conversation
Python/bytecodes.c
Outdated
@@ -465,6 +465,20 @@ dummy_func( | |||
|
|||
// stack effect: (__0 -- ) | |||
inst(BINARY_SUBSCR) { | |||
if (!cframe.use_tracing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bothers me.
First of all the extra branch might slow things down, but DO_TRACING
was reasonably self contained and cframe.use_tracing
didn't have to be checked too many other places.
Would incrementing the counter in DO_TRACING
, so that ADAPTIVE_COUNTER_IS_ZERO(cache->counter)
is guaranteed to be false and the DECREMENT_ADAPTIVE_COUNTER
is cancelled out?
In DO_TRACING
add something like?
if (is_adaptive(opcode)) {
INCREMENT_ADAPTIVE_COUNTER(next_instr);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tiny bit trickier (we have to make sure we don't overflow the counter to zero), but sure, I'll try that!
Python/bytecodes.c
Outdated
@@ -3977,11 +3930,14 @@ dummy_func( | |||
// stack effect: ( -- ) | |||
inst(EXTENDED_ARG) { | |||
assert(oparg); | |||
opcode = _Py_OPCODE(*next_instr); | |||
if (cframe.use_tracing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could handle EXTENDED_ARG
in DO_TRACING
. It makes DO_TRACING
even slower, but we can then assert cframe.use_tracing == 0
here.
STAT_INC(opcode, miss); \ | ||
STAT_INC(INSTNAME, miss); \ | ||
/* The counter is always the first cache entry: */ \ | ||
if (ADAPTIVE_COUNTER_IS_ZERO(*next_instr)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap this in a #ifdef Py_STATS
just be sure that the comment 5 lines up is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can put an #ifdef
inside of a macro definition... or are you suggesting to define two different versions of DEOPT_IF
based on #ifdef Py_STATS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
#ifdef Py_STATS
#define MISS_STATS(opcode, INSTR_NAME) \
...
#else
#define MISS_STATS(opcode, INSTR_NAME) ((void)0)
#endif
#define DEOPT_IF(COND, INSTNAME) \
if (COND) { \
MISS_STATS(opcode, INSTR_NAME); \
assert(_PyOpcode_Deopt[opcode] == INSTNAME); \
GO_TO_INSTRUCTION(INSTNAME); \
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but will need benchmarks run
"1.00x" faster:
|
This gets us one step closer to skipping the quickening step entirely for new code objects... with this change, quickening only involves inserting superinstructions and initializing warmup counters. We do this by getting rid of the
EXTENDED_ARG_QUICK
instruction and making all specializable opcodes contain their own adaptive logic.Getting this right is a bit tricky, since there are four cases where we want to execute an unquickened instruction:
The key insight here is that the logic is identical for the first three cases:
All that we need to do is change the miss counters for specialized instructions to use the same format as the adaptive backoff counter, and the same code paths can be shared. We skip all of this in the fourth case (tracing) with a simple
if (!cframe.use_tracing) { ... }
guard around the adaptive code (maybe there's a clever way of avoiding this branch, but I doubt it's actually very expensive in practice).Finally, as an added bonus, merging these code paths allows specialization misses to jump directly into the unquickened instructions, rather than using an indirect jump through a shared
miss
block.