Open
Description
We ran into this issue with LLVM 20.x that seems similar to #71822.
Unfortunately I'm not able to come up with a minimum reproducible example but the most basic bolt options to trigger this for our case is
llvm-bolt "$INPUT_PATH" -o "$OUTPUT_PATH" -data="$PROFILE_PATH" --use-gnu-stack --reorder-functions=hfsort --split-functions --split-all-cold --reorder-blocks=ext-tsp
As you can imagine this is quite sensitive to the profile and the pre-bolt code layout, but those things constant it is able to reproduce every time.
More logs:
BOLT-INFO: Finished pass: print dyno-stats after optimizations
BOLT-INFO: Starting pass: simplify-conditional-tail-calls
BOLT-INFO: Finished pass: simplify-conditional-tail-calls
BOLT-INFO: Starting pass: peepholes
BOLT-INFO: Finished pass: peepholes
BOLT-INFO: Starting pass: aligner
BOLT-INFO: Finished pass: aligner
BOLT-INFO: Starting pass: reorder-data
BOLT-INFO: Finished pass: reorder-data
BOLT-INFO: Starting pass: patch-entries
BOLT-INFO: Finished pass: patch-entries
BOLT-INFO: Starting pass: adr-relaxation
BOLT-INFO: Finished pass: adr-relaxation
BOLT-INFO: Starting pass: long-jmp
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 70 stubs in the hot area and 38 stubs in the cold area. Shared 0 times, iterated 3 times.
BOLT-INFO: Finished pass: long-jmp
BOLT-INFO: Starting pass: finalize-functions
BOLT-INFO: Finished pass: finalize-functions
BOLT-INFO: Starting pass: frame-optimizer
BOLT-INFO: Finished pass: frame-optimizer
BOLT-INFO: Starting pass: alloc-combiner
BOLT-INFO: Finished pass: alloc-combiner
BOLT-INFO: Starting pass: retpoline-insertion
BOLT-INFO: Finished pass: retpoline-insertion
BOLT-INFO: Starting pass: assign-sections
BOLT-INFO: Finished pass: assign-sections
BOLT-INFO: Starting pass: inst-lowering
BOLT-INFO: Finished pass: inst-lowering
BOLT-INFO: Starting pass: lower-annotations
BOLT-INFO: Finished pass: lower-annotations
BOLT-INFO: Starting pass: clean-mc-state
BOLT-INFO: Finished pass: clean-mc-state
BOLT-INFO: using original .text for new code with 0x10000 alignment
BOLT-ERROR: JITLink failed: In graph in-memory object file, section .text.cold: relocation target .text + 0x2664 at address 0x40b0000 is out of range of TestAndBranch14PCRel fixup at 0x40b95a4 ($x, 0x40b6f40 + 0x2664)
Sorry I'm not able to provide the profile or a repro, but this looks to me like a case where bolt is splitting the function too far away for the tbz/tbnz
instruction it's trying to use so hopefully it's enough to start taking a look.
Please let me know if there's more information you need or testing you want us to do on our side.