Description
Hi
I noticed that the Ajla programming language ( https://www.ajla-lang.cz/ ) runs slower when being compiled by clang-19 compared to clang-18 or gcc-14. I looked at the assembler, and it turns out that the bytecode interpreter (the file "ipret.c") is not compiled as efficiently as it could be. In particular, clang-19 joins all the "goto *next_label" statements in a function into just one "jmp *" instruction. That reduces code size, but it also makes branch prediction inefficient, because the CPU cannot learn that a single instruction jumps to multiple targets.
I created this example that shows the issue:
http://www.jikos.cz/~mikulas/testcases/clang/computed-goto.c
(use: clang-19 -O2 computed-goto.c && time ./a.out)
The results (in seconds) are here:
http://www.jikos.cz/~mikulas/testcases/clang/computed-goto.txt
We can see that the worst slowdown happens on Sandy Bridge. Zen 4 shows no slowdown, so it seems that it has smart indirect branch predictor that can predict multiple jump targets from a single instruction.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done