-
Notifications
You must be signed in to change notification settings - Fork 5k
Improve some cases of bad debug info propagation in optimized code #61846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve some cases of bad debug info propagation in optimized code #61846
Conversation
* Propagate debug info in loop cloning * Do not consume debug info on the standalone call statement created for inline candidates. This would lead to the loss of debugging information for failed inline candidates, since those statements are dropped and expanded in the upcoming GT_RET_EXPR node instead. In some cases it would also lead to the loss of debugging information for successful inlines. In the new logic we allow the same debugging information to be attached to the upcoming statement using the GT_RET_EXPR. This change adds around 40 KB (~0.5%) to SPC.
…->IL mappings Currently we have some pretty convoluted logic that tries to ensure we only generate multiple IP mappings mapping to the same native offset in rare cases. Otherwise we try to remove one of the mappings so that we end up with only one IL offset for each native offset. In optimized code it is common that we have multiple IL offsets mapping to the same native code address, for example when a statement does not end up generating any native code (e.g. GT_ASG between two locals in the same register). The loss of one of the mappings leads to a poor debugging experience in those cases, where breakpoints fail to bind completely depending on which statement is breakpointed. This changes the debug mappings logic so that we only discard mappings for the same native offset when one of the mappings does not map to any IL offset, or when both mappings map to literally the same IL offset. This should not affect code when compiled without optimizations: there we emit nops when we encounter subsequent IL offsets without any intermediate generated native code.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:
Example with inline candidate that fails to inlinehttps://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4 Example with inline candidate that succeedsinlined.mp4Previously one could not breakpoint the + ; INLRT @ 0x00A[E-]
; INLRT @ 0x012[E-]
4881FEF0D8FFFF cmp rsi, -0x2710
7412 je SHORT G_M30057_IG04 Due to bullet 3 above we now also report both of these back to the EE: IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
IL offs 0x0000 : 0x00000008 ( STACK_EMPTY )
+IL offs 0x000A : 0x0000001A ( STACK_EMPTY )
IL offs 0x0012 : 0x0000001A ( STACK_EMPTY )
IL offs 0x001B : 0x00000023 ( STACK_EMPTY )
IL offs 0x0030 : 0x00000035 ( STACK_EMPTY )
IL offs EPILOG : 0x00000035 ( STACK_EMPTY )
IL offs 0x002A : 0x0000003B ( STACK_EMPTY ) The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think). Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas
|
Should this be rather fixed in the debugger? |
I'm guessing that would be hard or impossible as the jit is not conveying enough information.
All this location info gets somewhat compressed (delta encoded / nibble stream). Seems like it could be compressed down a lot further but not sure what the right tradeoffs are or what all would need to change to compensate. |
Can the debugger snap to the first mapping it finds by going back in the code stream? |
I think if we go this way we definitely need to fix what the JIT is reporting as well. Currently the JIT will prefer to report the highest IL offset, except when the lowest offset is the beginning of an instruction group, in which case it will report the lowest offset. I suppose we would want it to always emit the lowest IL offset if we did this. If we have to choose between the mappings it is not clear to me which one is "semantically" the best to report here. The highest IL offset is the one that was attached to IR nodes that ended up generating actual machine code. The lowest IL should logically be before most of the code that might be 'folded in' or optimized to be unnecessary. Both the IL offsets were requested to be reported by the EE (so in an ideal world, maybe the filtering logic ought to be in the EE). Maybe we should do both things (report multiple mappings, change the debugger to snap breakpoints back)? I checked and it looks like reporting the multiple mappings is not all that expensive, it accounts for ~5 KB of the size increase seen in this PR. Otherwise I experimented a little with it by not bailing in the first case here: runtime/src/coreclr/debug/ee/controller.cpp Lines 1894 to 1914 in 4e2d3a6
It seems to work ok, at least when JIT picks picks the earliest IL offset to report. I suppose there may be situations where we could end up breakpointing an entirely different basic block (or even worse, different funclet) if the data reported by the JIT is too sparse, however. |
Also looks like there are a few issues to iron out in any case, the failing tests involve line numbers in exceptions, so seems caused by this change. |
The test failure looks to be in debug mode. The new logic eliminates a mapping that the old logic did not: - IP mapping count : 24
+ IP mapping count : 23
IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
IL offs NO_MAP : 0x0000002F ( STACK_EMPTY )
IL offs 0x0000 : 0x0000003D ( STACK_EMPTY )
IL offs 0x0001 : 0x0000003E ( STACK_EMPTY )
IL offs 0x0002 : 0x0000003F ( STACK_EMPTY )
IL offs 0x0010 : 0x00000072 ( CALL_INSTRUCTION )
IL offs 0x0015 : 0x00000077
-IL offs 0x001A : 0x0000008B ( STACK_EMPTY )
IL offs 0x001A : 0x0000008B ( STACK_EMPTY CALL_INSTRUCTION )
IL offs 0x001F : 0x00000090 ( STACK_EMPTY )
IL offs 0x0020 : 0x00000091 ( STACK_EMPTY )
IL offs 0x0025 : 0x0000009E ( CALL_INSTRUCTION )
IL offs 0x002A : 0x000000A3 ( STACK_EMPTY )
IL offs 0x002B : 0x000000A4 ( STACK_EMPTY )
IL offs 0x002C : 0x000000A5 ( STACK_EMPTY )
IL offs 0x0032 : 0x000000A8 ( STACK_EMPTY )
IL offs 0x0036 : 0x000000C2 ( CALL_INSTRUCTION )
IL offs 0x003B : 0x000000C7
IL offs 0x0040 : 0x000000DB ( STACK_EMPTY )
IL offs EPILOG : 0x000000DC ( STACK_EMPTY )
IL offs PROLOG : 0x000000E4 ( STACK_EMPTY )
IL offs 0x002E : 0x000000FB
IL offs 0x002F : 0x000000FF ( STACK_EMPTY )
IL offs 0x0030 : 0x00000100 ( STACK_EMPTY ) Not sure yet why this should have any effect, might be that the correlation logic for stack traces does not check the flags as actual flags. |
|
Maybe @noahfalk @dotnet/dotnet-diag might have input here. |
Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it. |
This addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:
inline candidates. This would lead to the complete loss of debugging
information for failed inline candidates, since those statements are
dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
cases it would also lead to the loss of debugging information for
successful inlines.
In the new logic we allow the same debugging information to be
attached to the upcoming statement using the GT_RET_EXPR.
Currently we have some pretty convoluted logic that tries to ensure we
only generate multiple IP mappings mapping to the same native offset in
rare cases. Otherwise we try to remove one of the mappings so that we
end up with only one IL offset for each native offset.
In optimized code it is common that we have multiple IL offsets mapping
to the same native code address, for example when a statement does not
end up generating any native code (e.g. GT_ASG between two locals in the
same register). The loss of one of the mappings leads to a poor
debugging experience in those cases, where breakpoints fail to bind
completely depending on which statement is breakpointed.
This changes the debug mappings logic so that we only discard mappings
for the same native offset when one of the mappings does not map to any
IL offset, or when both mappings map to literally the same IL offset.
This should not affect code when compiled without optimizations: there
we emit nops when we encounter subsequent IL offsets without any
intermediate generated native code.
I have also removed some dead code in the debugger as part of this.
Example with inline candidate that fails to inline
https://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4
Previously we lost all debug information here due to bullet 2 above.
Example with inline candidate that succeeds
inlined.mp4
Previously one could not breakpoint the
long matchTimeoutTicks = matchTimeout.Ticks;
statement because it is inlined and the assignment ends up generating no code (matchTimeoutTicks
is allocated intorsi
wherematchTimeout
already is after the prolog). We now properly get the debug information after fixing bullet 2 above:+ ; INLRT @ 0x00A[E-] ; INLRT @ 0x012[E-] 4881FEF0D8FFFF cmp rsi, -0x2710 7412 je SHORT G_M30057_IG04
Due to bullet 3 above we now also report both of these back to the EE:
IL offs PROLOG : 0x00000000 ( STACK_EMPTY ) IL offs 0x0000 : 0x00000008 ( STACK_EMPTY ) +IL offs 0x000A : 0x0000001A ( STACK_EMPTY ) IL offs 0x0012 : 0x0000001A ( STACK_EMPTY ) IL offs 0x001B : 0x00000023 ( STACK_EMPTY ) IL offs 0x0030 : 0x00000035 ( STACK_EMPTY ) IL offs EPILOG : 0x00000035 ( STACK_EMPTY ) IL offs 0x002A : 0x0000003B ( STACK_EMPTY )
The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think).
Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas