-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[GR-40874] Perf support #4811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-40874] Perf support #4811
Conversation
@stooke It would be good if you could review this patch even though it is mostly incidental to the Windows code. There is one semantic change which I believe does not affect the Windows code but I'd very much like you to validate that assumption. I have repurposed fields The associated collection instances were originally populated with That meant that some of these so-called primary classes did not actually have any top level compiled methods. Luckily code that processes entries in these collections appears to always check whether the entry has any compiled methods, by calling In consequence I have repurposed this list so that it is populated with all instance classes that get presented via the |
@adinn I'll take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adinn. It looks good to me other than some typos and a few comments I added for consideration.
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/ClassEntry.java
Outdated
Show resolved
Hide resolved
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfDebugInfo.java
Outdated
Show resolved
Hide resolved
.../src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageDebugInfoProvider.java
Show resolved
Hide resolved
.../src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/image/NativeImageDebugInfoProvider.java
Show resolved
Hide resolved
...tevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfInfoSectionImpl.java
Show resolved
Hide resolved
...tevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfInfoSectionImpl.java
Outdated
Show resolved
Hide resolved
...tevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfInfoSectionImpl.java
Show resolved
Hide resolved
@adinn, regarding perfs behavior you could probably ask the maintainer Arnaldo (Red Hat). This is his handle on Twitter https://twitter.com/acmel?t=Dt5BWUsXmY27r7REYbSCuw&s=09 |
@SergejIsbrecht I resolved the problem with display of hex addresses for some methods in the most recent push. The problem is that perf only starts translating sampled code addresses from some given code section ( So, if you don't preserve method symbols (i.e. pass The fix is to add a dummy code symbol at the start of the .text section when |
@SergejIsbrecht Also, assuming you do retain local symbols, the latest version of this patch will provide a pretty reliable perf annotate output for compiled Java methods. You can run a bare call
to see stats for the whole program. Alternatively, you can annotate a single method by passing its symbol name
Unfortunately, at present you have to provide the symbolic name for the method, not the DWARF name I'd be interested to hear your feedback on the current state of play with this feature. |
@adinn , I am currently compiling for x64. I will attach some FlameGraphs, when everything is running with your branch. |
@adinn , first of all it look way better in comparison to before, but I still see some "unknown" addresses
(perf report output) I compiled a JMH benchmark to a native-image and ran
If you would like I could create some FlameGraphs with async-profiler with an actual JVM for comparisons. Also there are still some SHA1 hashes around
|
@SergejIsbrecht Thanks very much for doing that! It certainly looks very interesting.
I do not know where those sample addresses come from but they most definitely look like they are not from method code compiled into the
Yes, that might be interesting to see how easy it is to compare what is going on under the actual benchmark run methods. Of course, your graph will include a slab of profile data for both JVM startup and benchmark warmup (including a load of execution bundled as 'interpreter') all of which will be missing in the Graal case.
That's not necessarily an indication that the name is being derived from a symbol rather than from the debuginfo. In this case the outermost pthread start method whose name includes an SHA1 hash is a synthetic stub method. Methods like this one only ever get presented to the debug info generator with a method name that includes an SHA1 hash (there is no 'real' Java name). In this case I am certain that the printed name has been derived from debug info because the class name is package qualified. If a symbol had been used to name the sample then there would be no package qualifier (the symbol naming algorithm used by GraalVM constructs symbols as BareClass_Method_sha1hash). |
@SergejIsbrecht Thinking about it those weird addresses may be the result of improper stack unwinding. That could indicate an issue with the generation of the |
Don't bother. I will give you one later that day. |
@adinn , I am currently working on an example. Here are some FlameGraphs with Java11 Hotspot C2. FlameGraphs are captured with async-profiler 2.8 on Linux (5.4) JMH Gradle Plugin:
https://gist.github.com/SergejIsbrecht/dda8abca7c3c1004f03f1507bc1e9240 (Just download the HTML files and open them) https://gist.github.com/SergejIsbrecht/6a87bd9a5cf4ef89b7ec9ff2f8f04d95 This FlameGraph was captured with your branch and flamegraph-rs.
The Hotspot C2 FlameGraph kind of looks different. There are a lot of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for a adding perf-support!
See individual review comments. In addition:
-
Please also test if this works with images that are shared libraries (i.e. built with
--shared
). -
Please also add documentation on how to use perf with native-images.
Either indocs/reference-manual/native-image/DebugInfo.md
of you create an newdocs/reference-manual/native-image/Profiling.md
. It should contain info how the image needs to be built so that it can be profiled with perf, what is currently working, current limitations, plans/ideas for future improvements.
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/ClassEntry.java
Outdated
Show resolved
Hide resolved
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/ClassEntry.java
Outdated
Show resolved
Hide resolved
if (classEntry == null) { | ||
/* We must have a type entry -- TODO prove we never reach here. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a non-API HostedOption (false
by default) and use it here to throw a VMError if it helps you with finding if there are still cases where you reach this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confident we can now reduce this call to a lookup, with an assert that the result is found, and dispense with any need for proof. So, I'll push an extra change to that effect.
Why so? Well, the continued existence of this method is an artefact of creeping change in the meaning of the fields now renamed to instancesClasses
and instanceClassesIndex
. Originally these fields tracked instance classes with top level compiled methods. The methods and associated classes were presented via the DebugCodeInfo API and ultimately derived by iterating over the NativeImageCodeCache
. The expectation was that all of these classes would already have been presented via the DebugTypeInfo API, ultimately derived by iterating over the Universe type. However, the list was originally only meant to include the proper (i.e. strict) subset of the notified instance classes which needed to be qualified with method location and range info.
When we started reporting frame data via the DebugCodeInfo
API we found classes that were i) not already identified via the DebugTypeInfo
API or ii) had inlined methods but no top level methods. These also required generation of inline method location and range info. So, we threw them into the list and index, creating a ClassEntry
if none existed and then sorted them out case by case in code that used the list of classes.
The first problem was an input hygiene problem. We ran into 'missing' types because frames were actually referencing analysis types instead of hosted types. That has been fixed on the notification side (NativeImageDebugInfoProvider
) by remapping classes from analysis to hosted domain before passing them through the DebugCodeInfo
API.
The second problem reflect the fact that we really need to iterate over all instance classes at certain points in the code and only over some instance classes at others. A ClassEntry
now identifies whether or not it has top level compiled methods and/or inlined methods and all code that now consumes the instanceClasses
list tests for the relevant cases.
So, both the problems we originally faced appear to be resolved. The current code populates this list and index with all expected instance classes during traversal of universe types set. So, the checks and updtes that used to be done in ensureClassEntry
really ought to be redundant.
n.b. There is another follow-up task needed to consolidate the current position beyond redefining ensureClassEntry
. From one point of view fields instanceClasses
and instanceClassesIndex
are redundant. We now only consume the instaceClasses list via a Stream API. The stream content manifested as a list in instanceClasses
can be derived virtually by filtering field types
with a type test for ClassEntry
and cast before returning. Similarly, lookups in the index map instanceClassesIndex
can be replaced with a lookup in the index map typesIndex
that also employs an instanceof
check. So, we have the option to virtualize the instance class list and index, trading off space against a small amount of stream processing time.
Whichever way we decide to go with this a code change is needed. There are a couple of cases left where iteration over all instance classes was and still is performed by streaming and filtering field types
(this made sense when not all instance classes were listed in field instanceClasses
). I'm happy to fix this as part of this patch or in a follow-up. Do you have a preference for retaining two lists or virtualizing one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I retained the separate list of instance classes but everything is now using streams derived either from the types list or from the classes list.
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/ClassEntry.java
Show resolved
Hide resolved
Thanks for providing the flame graphs -- they are very interesting. I am not sure why you are seeing frames omitted in the native graph but I have a suspicion what is going on. Before mving on to that I'll note that both Hotspot and GraalVM should be able to allocate sample addresses to inlined methods rather than simply to the top level compiled method. However, this will rarely be 100% accurate. Because of the way code gets reorganized during optimization machine code that derives from one method often gets accounted for as belonging to another method. That said, I don't think this explains the disparity seen in your example. So, here's my guess as to what is happenign. When you built your image did you pass option n.b. the reason this argument is not the default is that including extra frame info in the compiler graph may cause the register allocator to think that values in registers are alive for slightly longer than would otherwise be assumed. That's not necessarily going to be a problem for all compiled methods but it can easily cause a noticeable drop in performance for benchmarks, especially micro-benchmarks. So, be aware that if you do switch this flag on you might see your benchmark run a bit slower. |
@adinn , it does not seem to matter whether I use When I run
While importing following issue is displayed: The hex address seems quite unusual.
(perf report) Later today I will run some native rust / c++ application with perf in order to see whether I see such unusual addresses. |
Ok, thanks very much for testing it.
Well it looks like the debuginfo is incorrect. Can you provide me with the benchmark source and detailed instructions on how to generate the binary? I'd really like to have a look at the debuginfo it contains. I'd also like to run addr2line under gdb to see what is causing it to print those errors. |
@adinn, sure. I will provide a working example in some hours. Anyways, this jmh bench is quite trivial, therefor any code should suffer from the same issue. |
substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/debugentry/ClassEntry.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments Andrew. It looks good to me!
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
...m/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfARangesSectionImpl.java
Outdated
Show resolved
Hide resolved
...evm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfFrameSectionImpl.java
Outdated
Show resolved
Hide resolved
...evm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfFrameSectionImpl.java
Outdated
Show resolved
Hide resolved
...tevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfInfoSectionImpl.java
Outdated
Show resolved
Hide resolved
...atevm/src/com.oracle.objectfile/src/com/oracle/objectfile/elf/dwarf/DwarfLocSectionImpl.java
Show resolved
Hide resolved
|
||
Note that the start address of the method is now labelled with the | ||
mangled symbol name `String_constructor_f60263d569497f1facccd5467ef60532e990f75d` as well as the DWARF name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool if c++filt
would support our mangling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it would. I have had a quick look into what perf does to unmangle names. By default it calls bfd_demangle
which has its own slightly weird expectations about how to handle names e.g. arrays of TYPE
are modelled as JArray<TYPE>
. If (when) that fails it should punt to a routine in the perf code called java_demangle_sym
. This knows how to demangle bytecode type and member signatures into Java syntax. For example it will turn Ljava/lang/StringLatin1;equals([B[B)Z
into boolean java.lang.StringLatin1.equals(byte[], byte[])"
. We might be able to use that for some cases but I guess it won't do for methods exposed as C entry points which actually need a linker-friendly name.
Note that anyway I am not currently using the symbol name as the linkage_name
atttribute in the method.definition. At present I generate that attribute as <fully_qualified_class_name>::<bare_method_name>
in order to ensure that 1) perf report
prints something sensible when there are no local symbols 2) perf annotate
includes this name at the start of the listing for a given method. We can probably do better than this (but maybe in a follow-up patch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe in a follow-up patch
Definitely
I will run the PR in our CI to see what comes up. |
|
Ok, I think that's from the latest rebase thanks to commit 1b6c1bf "Add support for running a native-image main entry point in a new thread" It introduces a new call into the runCore hierarchy by delegating to This should have been caught by an mx debuginfotest gate test when the patch was merged. Any idea why not? A patch is on its way. |
Ah, that patch did update the test script. It looks like I broke it again when I did the merge. |
btw, apart from that everything looks good 🤞 |
Ok, great. I just pushed a patch that fixes the debug info test. All that is left is to squash the commits. |
And there's the squashed version |
Restarted internal CI |
@adinn the PR is in the merge queue. |
This PR implements changes to Linux debug info generation specified in issue #4810. These enable the Linux tool
perf
to display profile information for GraalVM native images.This patch 'fixes'
perf
modulo from a few small remaining details:perf
expects many references from definitions to associated declarations (specification
orabstract_origin
attribute values) to be implemented as compile unit (CU) local offsets because it assumes that a single CU owns both the referring and referenced records. There are still a few cases in Java debug info where definitions and declarations need to be in different CUs and so must use an info section global offset. This is no problem forgdb
butperf
cannot cope with it.perf
does display generated machine code for sampled code addresses. However, it appears to be adding an extra 4K offset to the code address before decompiling, meaning that the wrong code is displayed. I am not clear why this is happening.perf
occasionally lists sampled method code addresses as hex numbers (e.g. something like0x4061ba
) instead of using the fully qualified method name. I am not clear why this is happening. The odd thing is thatperf
delegates to binaryaddr2line
to do name, file and line lookup and this program prints the correct name, file, line and even inline frame info for these addresses.