Description
Summary
The numeric values produced by llvm-xray stack --stack-format=flame --aggregation-type=time --all-stacks
are incorrect because they are the total amount of time spent in each function, but flamegraph.pl
needs these values to instead be only the amount of time spent directly in each function excluding any time spent in functions called by a function, because flamegraph.pl
does its own summing of each's total time.
Description
For the reproducible example program given farther below:
The incorrect output is like:
main; 17797754976
main;f2; 5925051360
main;f2;busy; 5925047168
main;f1; 5941978880
main;f1;busy; 5941971904
main;busy; 5930714592
Which renders as an incorrect graph:
But correct output would be like:
main; 10144
main;f2; 4192
main;f2;busy; 5925047168
main;f1; 6976
main;f1;busy; 5941971904
main;busy; 5930714592
Which renders as the correct graph:
I.e., only leafs should have their total amount of time, and non-leafs should have the difference between their total and the sum of all their children's totals, because flamegraph.pl
itself will sum each non-leaf's value with its descendents' values for showing each non-leaf's total amount of time.
More precisely: (click to expand)
bad-val(STACK) = total(STACK)
total(STACK) = exit(STACK) - enter(STACK)
bad-val(main) = 17797754976 = exit(main) - enter(main)
enter(main) = 69839874297664
exit(main) = 69857672052640
bad-val(main;f1) = 5941978880 = exit(main;f1) - enter(main;f1)
enter(main;f1) = 69845805019744
exit(main;f1) = 69851746998624
...
good-val(STACK) = total(STACK) - child-sum(STACK)
child-sum(STACK) = total(STACK;CHILD-1) + ... + total(STACK;CHILD-N)
good-val(main) = 10144 = total(main) - child-sum(main)
child-sum(main) = total(main;busy) + total(main;f1) + total(main;f2)
= 5930714592 + 5941978880 + 5925051360
= good-val(main;busy)
+ good-val(main;f1) + good-val(main;f1;busy)
+ good-val(main;f2) + good-val(main;f2;busy)
child-sum(main;busy) = 0
child-sum(main;f1) = total(main;f1;busy)
child-sum(main;f2) = total(main;f2;busy)
child-sum(main;f1;busy) = 0
child-sum(main;f2;busy) = 0
...
(Hopefully, I didn't screw that up. I think you'll get the idea.)
Reproducible Example
main.c
file
static void busy(void) { for (unsigned i = 4000000000; i > 0; i--) ; }
static void f1(void) { busy(); }
static void f2(void) { busy(); }
int main(void) { busy(); f1(); f2(); }
Build, run, and generate
clang -O0 -fxray-instrument -fxray-instruction-threshold=1 main.c -o main
XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./main
X=xray-log.main.$YOURS
llvm-xray stack --stack-format=flame --aggregation-type=time --all-stacks \
--instr_map=./main $X > $X.flame
flamegraph.pl $X.flame > $X.svg
Note: I've also had some other similarly incorrect flame-graphs from my much more complicated programs, that were compiled without -fxray-instruction-threshold=$T
(i.e. where that threshold was the default larger one), but I didn't analyze their (much more complicated) flame output to prove the incorrectness for those.
Versions
$ llvm-xray --version
LLVM (http://llvm.org/):
LLVM version 16.0.6
Optimized build.
$ clang --version
clang version 16.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/b3qg3lglka8s298plxfkhlaifw5njv5c-clang-16.0.6/bin
(Both installed via the NixOS packages llvmPackages_16.libllvm
and clang_16
(from the unstable
channel, to have the latest version 16.0.6
).)
Version of flamegraph.pl
: d9fcc272b6a08c3e3e5b7919040f0ab5f8952d65
(Git commit of repo).