Skip to content

FlameGraphs incorrect from llvm-xray stack --stack-format=flame #63676

Open
@DerickEddington

Description

@DerickEddington

Summary

The numeric values produced by llvm-xray stack --stack-format=flame --aggregation-type=time --all-stacks are incorrect because they are the total amount of time spent in each function, but flamegraph.pl needs these values to instead be only the amount of time spent directly in each function excluding any time spent in functions called by a function, because flamegraph.pl does its own summing of each's total time.

Description

For the reproducible example program given farther below:

The incorrect output is like:

main; 17797754976
main;f2; 5925051360
main;f2;busy; 5925047168
main;f1; 5941978880
main;f1;busy; 5941971904
main;busy; 5930714592

Which renders as an incorrect graph:
xray-log main jFn5JW

But correct output would be like:

main; 10144
main;f2; 4192
main;f2;busy; 5925047168
main;f1; 6976
main;f1;busy; 5941971904
main;busy; 5930714592

Which renders as the correct graph:
xray-log main jFn5JW corrected

I.e., only leafs should have their total amount of time, and non-leafs should have the difference between their total and the sum of all their children's totals, because flamegraph.pl itself will sum each non-leaf's value with its descendents' values for showing each non-leaf's total amount of time.

More precisely: (click to expand)
bad-val(STACK) = total(STACK)
total(STACK) = exit(STACK) - enter(STACK)

bad-val(main) =  17797754976 = exit(main) - enter(main)
enter(main) = 69839874297664
exit(main)  = 69857672052640

bad-val(main;f1) =   5941978880 = exit(main;f1) - enter(main;f1)
enter(main;f1) = 69845805019744
exit(main;f1)  = 69851746998624

...
good-val(STACK) = total(STACK) - child-sum(STACK)
child-sum(STACK) = total(STACK;CHILD-1) + ... + total(STACK;CHILD-N)

good-val(main) = 10144 = total(main) - child-sum(main)
child-sum(main) = total(main;busy) + total(main;f1) + total(main;f2)
                =       5930714592 +     5941978880 +     5925051360
                =   good-val(main;busy)
                  + good-val(main;f1) + good-val(main;f1;busy)
                  + good-val(main;f2) + good-val(main;f2;busy)

child-sum(main;busy) = 0
child-sum(main;f1) = total(main;f1;busy)
child-sum(main;f2) = total(main;f2;busy)
child-sum(main;f1;busy) = 0
child-sum(main;f2;busy) = 0

...

(Hopefully, I didn't screw that up. I think you'll get the idea.)

Reproducible Example

main.c file

static void busy(void) { for (unsigned i = 4000000000; i > 0; i--) ; }
static void f1(void) { busy(); }
static void f2(void) { busy(); }
int main(void) { busy(); f1(); f2(); }

Build, run, and generate

clang -O0 -fxray-instrument -fxray-instruction-threshold=1 main.c -o main
XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./main
X=xray-log.main.$YOURS

llvm-xray stack --stack-format=flame --aggregation-type=time --all-stacks \
                --instr_map=./main $X > $X.flame

flamegraph.pl $X.flame > $X.svg

Note: I've also had some other similarly incorrect flame-graphs from my much more complicated programs, that were compiled without -fxray-instruction-threshold=$T (i.e. where that threshold was the default larger one), but I didn't analyze their (much more complicated) flame output to prove the incorrectness for those.

Versions

$ llvm-xray --version
LLVM (http://llvm.org/):
  LLVM version 16.0.6
  Optimized build.
$ clang --version
clang version 16.0.6
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nix/store/b3qg3lglka8s298plxfkhlaifw5njv5c-clang-16.0.6/bin

(Both installed via the NixOS packages llvmPackages_16.libllvm and clang_16 (from the unstable channel, to have the latest version 16.0.6).)

Version of flamegraph.pl: d9fcc272b6a08c3e3e5b7919040f0ab5f8952d65 (Git commit of repo).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions