From 4603470bb6d73e6207b405203492c36b275eb7ce Mon Sep 17 00:00:00 2001
From: Nelson Elhage <nelhage@nelhage.com>
Date: Mon, 10 Feb 2025 18:10:11 -0800
Subject: [PATCH] Prevent the compiler from merging computed-goto dispatches

When compiling the computed-goto interpreter, every opcode
implementation ends with an identical chunk of code, generated by the
`DISPATCH()` macro. In some cases, the compiler is able to notice
this, and replaces the code in one or more opcodes with a jump into
the tail portion of a different opcode.

However, we specifically **don't** want that to happen; the entire
premise of using computed gotos is to lift more information into the
instruction pointer in order to give the hardware branch-target-
predictor more information to work with! In my preliminary tests, this
tail-merging of opcode implementations explains most of the
performance improvement of the new tail-call interpreter (#128718) --
compilers are much less willing to merge code across functions, and so
the tail-call interpreter preserves all (or at least more) of the
individual `DISPATCH` sites.

This change attempts to prevent the merging of `DISPATCH` calls, by
adding an (empty) `__asm__ volatile`, which acts as an opaque barrier
to the optimizer, preventing it from considering all of these
sequences as identical.
---
 Python/ceval_macros.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Python/ceval_macros.h b/Python/ceval_macros.h
index 0a4f65feb3b512..882d3950c8d826 100644
--- a/Python/ceval_macros.h
+++ b/Python/ceval_macros.h
@@ -95,7 +95,7 @@
 #    define LABEL(name) TARGET(name)
 #elif USE_COMPUTED_GOTOS
 #  define TARGET(op) TARGET_##op:
-#  define DISPATCH_GOTO() goto *opcode_targets[opcode]
+#  define DISPATCH_GOTO() do { __asm__ volatile (""); goto *opcode_targets[opcode]; } while (0)
 #  define JUMP_TO_LABEL(name) goto name;
 #  define JUMP_TO_PREDICTED(name) goto PREDICTED_##name;
 #  define LABEL(name) name: