@@ -18,6 +18,8 @@ The resulting image will contain debug records in a format the GNU Debugger (GDB
1818Additionally, you can pass ` -O0 ` to the builder which specifies that no compiler optimizations should be performed.
1919Disabling all optimizations is not required, but in general it makes the debugging experience better.
2020
21+ Debug information is not just useful to the debugger. It can also be used by the Linux performance profiling tools ` perf ` and ` valgrind ` to correlate execution statistics such as CPU utilization or cache misses with specific, named Java methods and even link them to individual lines of Java code in the original Java source file.
22+
2123By default, debug info will only include details of some of the values of parameters and local variables.
2224This means that the debugger will report many parameters and local variables as being undefined. If you pass ` -O0 ` to the builder then full debug information will be included.
2325If you
@@ -35,8 +37,22 @@ However, it can significantly increase the size of the generated image on disk.
3537and local variable information by passing flag ` -H:+SourceLevelDebug ` can cause a program to be compiled
3638slightly differently and for some applications this can slow down execution.
3739
40+ The basic ` perf report ` command, which displays a histogram showing percentage execution time in each Java method, only requires passing flags ` -g ` and ` -H:+SourceLevelDebug ` to the ` native-image ` command.
41+ However, more sophisticated uses of ` perf ` (i.e. ` perf annotate ` ) and use of
42+ ` valgrind ` requires debug info to be supplemented with linkage symbols identifying compiled Java methods.
43+ Java method symbols are omitted from the generated native image by default but they can be retained achieved by passing one extra flag to the ` native-image ` command
44+
45+ ``` shell
46+ native-image -g -H:+SourceLevelDebug -H:-DeleteLocalSymbols Hello
47+ ```
48+
49+ Use of this flag will result in a small increase in the size of the
50+ resulting image file.
51+
3852> Note: Native Image debugging currently works on Linux with initial support for macOS. The feature is experimental.
3953
54+ > Note: Debug info support for ` perf ` and ` valgrind ` on Linux is an experimental feature.
55+
4056### Table of Contents
4157
4258- [ Source File Caching] ( #source-file-caching )
@@ -46,6 +62,7 @@ slightly differently and for some applications this can slow down execution.
4662- [ Checking Debug Info on Linux] ( #checking-debug-info-on-linux )
4763- [ Debugging with Isolates] ( #debugging-with-isolates )
4864- [ Debugging Helper Methods] ( #debugging-helper-methods )
65+ - [ Special Considerations for using perf and valgrind] (#using perf and valgrind)
4966
5067## Source File Caching
5168
@@ -772,4 +789,167 @@ For example, calling the method below prints high-level information about the Na
772789
773790### Further Reading
774791
775- - [ Debugging Native Image in VS Code] ( Debugging.md )
792+ - [ Debugging Native Image in VS Code] ( Debugging.md )
793+
794+ ## Special Considerations for using perf and valgrind
795+
796+ Debug info includes details of address ranges for top level and
797+ inlined compiled method code as well as mappings from code addresses
798+ to the corresponding source files and lines.
799+ ` perf ` and ` valgrind ` are able to use this information for some of
800+ their recording and reporting operations.
801+ For example, ` perf report ` is able to associate code adresses sampled
802+ during a ` perf record ` session with Java methods and print the
803+ DWARF-derived method name for the method in its output histogram.
804+
805+ ```
806+ . . .
807+ 68.18% 0.00% dirtest dirtest [.] _start
808+ |
809+ ---_start
810+ __libc_start_main_alias_2 (inlined)
811+ |
812+ |--65.21%--__libc_start_call_main
813+ | com.oracle.svm.core.code.IsolateEnterStub::JavaMainWrapper_run_5087f5482cc9a6abc971913ece43acb471d2631b (inlined)
814+ | com.oracle.svm.core.JavaMainWrapper::run (inlined)
815+ | |
816+ | |--55.84%--com.oracle.svm.core.JavaMainWrapper::runCore (inlined)
817+ | | com.oracle.svm.core.JavaMainWrapper::runCore0 (inlined)
818+ | | |
819+ | | |--55.25%--DirTest::main (inlined)
820+ | | | |
821+ | | | --54.91%--DirTest::listAll (inlined)
822+ . . .
823+ ```
824+
825+ Unfortunately, other operations require Java methods to be identified
826+ by an ELF (local) function symbol table entry locating the start of
827+ the compiled method code.
828+ In particular, assembly code dumps provided by both tools identify
829+ branch and call targets using an offset from the nearest symbol.
830+ Omitting Java method symbols means that offsets are generally
831+ displayed relative to some unrelated global symbol, usually the entry
832+ point for a method exported for invocation by C code.
833+
834+ As an illustration of the problem, the following excerpted output from
835+ ` perf annotate ` displays the first few annotated instructions of the
836+ compiled code for method ` java.lang.String::String() ` .
837+
838+ ```
839+ . . .
840+ : 501 java.lang.String::String():
841+ : 521 public String(byte[] bytes, int offset, int length, Charset charset) {
842+ 0.00 : 519d50: sub $0x68,%rsp
843+ 0.00 : 519d54: mov %rdi,0x38(%rsp)
844+ 0.00 : 519d59: mov %rsi,0x30(%rsp)
845+ 0.00 : 519d5e: mov %edx,0x64(%rsp)
846+ 0.00 : 519d62: mov %ecx,0x60(%rsp)
847+ 0.00 : 519d66: mov %r8,0x28(%rsp)
848+ 0.00 : 519d6b: cmp 0x8(%r15),%rsp
849+ 0.00 : 519d6f: jbe 51ae1a <graal_vm_locator_symbol+0xe26ba>
850+ 0.00 : 519d75: nop
851+ 0.00 : 519d76: nop
852+ : 522 Objects.requireNonNull(charset);
853+ 0.00 : 519d77: nop
854+ : 524 java.util.Objects::requireNonNull():
855+ : 207 if (obj == null)
856+ 0.00 : 519d78: nop
857+ 0.00 : 519d79: nop
858+ : 209 return obj;
859+ . . .
860+ ```
861+
862+ The leftmost column shows percentages for the amount of time recorded
863+ at each instruction in samples obtained during the ` perf record ` run.
864+ Each instruction is prefaced with it's address in the program's code
865+ section.
866+ The disassembly interleaves the source lines from which the code is
867+ derived, 521-524 for the top level code and 207-209 for the code
868+ inlined from from ` Objects.requireNonNull() ` .
869+ Also, the start of the method is labelled with the name defined in the
870+ DWARF debug info, ` java.lang.String::String() ` .
871+ However, the branch instruction ` jbe ` at address ` 0x519d6f ` uses a
872+ very large offset from ` graal_vm_locator_symbol ` .
873+ The printed offset does identify the correct address relative to the
874+ location of the symbol.
875+ However, this fails to make clear that the target address actually
876+ lies within the compiled code range for method ` String::String() ` i.e. that thsi is a method-local branch.
877+
878+ Readability of the tool output is significantly improved if
879+ option ` -H-DeleteLocalSymbols ` is passed to the ` native-image `
880+ command.
881+ The equivalent ` perf annotate ` output with this option enabled is as
882+ follows:
883+
884+ ```
885+ . . .
886+ : 5 000000000051aac0 <String_constructor_f60263d569497f1facccd5467ef60532e990f75d>:
887+ : 6 java.lang.String::String():
888+ : 521 * {@code offset} is greater than {@code bytes.length - length}
889+ : 522 *
890+ : 523 * @since 1.6
891+ : 524 */
892+ : 525 @SuppressWarnings("removal")
893+ : 526 public String(byte[] bytes, int offset, int length, Charset charset) {
894+ 0.00 : 51aac0: sub $0x68,%rsp
895+ 0.00 : 51aac4: mov %rdi,0x38(%rsp)
896+ 0.00 : 51aac9: mov %rsi,0x30(%rsp)
897+ 0.00 : 51aace: mov %edx,0x64(%rsp)
898+ 0.00 : 51aad2: mov %ecx,0x60(%rsp)
899+ 0.00 : 51aad6: mov %r8,0x28(%rsp)
900+ 0.00 : 51aadb: cmp 0x8(%r15),%rsp
901+ 0.00 : 51aadf: jbe 51bbc1 <String_constructor_f60263d569497f1facccd5467ef60532e990f75d+0x1101>
902+ 0.00 : 51aae5: nop
903+ 0.00 : 51aae6: nop
904+ : 522 Objects.requireNonNull(charset);
905+ 0.00 : 51aae7: nop
906+ : 524 java.util.Objects::requireNonNull():
907+ : 207 * @param <T> the type of the reference
908+ : 208 * @return {@code obj} if not {@code null}
909+ : 209 * @throws NullPointerException if {@code obj} is {@code null}
910+ : 210 */
911+ : 211 public static <T> T requireNonNull(T obj) {
912+ : 212 if (obj == null)
913+ 0.00 : 51aae8: nop
914+ 0.00 : 51aae9: nop
915+ : 209 throw new NullPointerException();
916+ : 210 return obj;
917+ . . .
918+ ```
919+
920+ In this version the start address of the method is now labelled with
921+ the mangled symbol name ` String_constructor_f60263d569497f1facccd5467ef60532e990f75d `
922+ as well as the DWARF name.
923+ The branch target is now printed using an offset from that start
924+ symbol.
925+
926+ Unfortunately, ` perf ` and ` valgrind ` do not correctly understand the
927+ mangling algorithm employed by GraalVM, nor are they currently able to
928+ replace the mangled name with the DWARF name in the disassembly even
929+ though both symbol and DWARF function data are known to identify code
930+ starting at the same address.
931+ So, the branch instruction still prints its target using a symbol plus
932+ offset but it is at least using the method symbol this time.
933+
934+ Also, because address ` 51aac0 ` is now recognized as a method start,
935+ ` perf ` has preceded the first line of the method with 5 context lines,
936+ which list the tail end of the method's javadoc comment.
937+ Unfortunately, perf has numbered these lines incorrectly, labelling
938+ the first comment with 521 rather than 516.
939+
940+ Executing command ` perf annotate ` will provide a disassembly listing
941+ for all methods and C functions in the image.
942+ It is possible to annotate a specific method by passing it's name as
943+ an argument to the perf annotate command.
944+ Note, however, that ` perf ` requries the mangled symbol name as
945+ argument rather than the DWARF name.
946+ So, in order to annotate method ` java.lang.String::String() ` it is
947+ necessary to run command `perf annotate
948+ String_constructor_f60263d569497f1facccd5467ef60532e990f75d`.
949+
950+ The ` valgrind ` tool ` callgrind ` also requires local symbols to be
951+ retained in order to provide high quality output.
952+ When ` callgrind ` is used in combination with a viewer like
953+ ` kcachegrind ` it is possible to identify a great deal of valuable
954+ information about native image execution aand relate it back to
955+ specific source code lines.
0 commit comments