Improve some cases of bad debug info propagation in optimized code #61846

jakobbotsch · 2021-11-19T16:29:05Z

This addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:

Propagate debug info in loop cloning
Do not consume debug info on the standalone call statement created for
inline candidates. This would lead to the complete loss of debugging
information for failed inline candidates, since those statements are
dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
cases it would also lead to the loss of debugging information for
successful inlines.
In the new logic we allow the same debugging information to be
attached to the upcoming statement using the GT_RET_EXPR.
Allow generating more native<->IL mappings mapping to native offsets.
Currently we have some pretty convoluted logic that tries to ensure we
only generate multiple IP mappings mapping to the same native offset in
rare cases. Otherwise we try to remove one of the mappings so that we
end up with only one IL offset for each native offset.

In optimized code it is common that we have multiple IL offsets mapping
to the same native code address, for example when a statement does not
end up generating any native code (e.g. GT_ASG between two locals in the
same register). The loss of one of the mappings leads to a poor
debugging experience in those cases, where breakpoints fail to bind
completely depending on which statement is breakpointed.

This changes the debug mappings logic so that we only discard mappings
for the same native offset when one of the mappings does not map to any
IL offset, or when both mappings map to literally the same IL offset.
This should not affect code when compiled without optimizations: there
we emit nops when we encounter subsequent IL offsets without any
intermediate generated native code.

I have also removed some dead code in the debugger as part of this.

Example with inline candidate that fails to inline

https://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4
Previously we lost all debug information here due to bullet 2 above.

Example with inline candidate that succeeds

inlined.mp4

Previously one could not breakpoint the long matchTimeoutTicks = matchTimeout.Ticks; statement because it is inlined and the assignment ends up generating no code (matchTimeoutTicks is allocated into rsi where matchTimeout already is after the prolog). We now properly get the debug information after fixing bullet 2 above:

+                            ; INLRT @ 0x00A[E-]
                             ; INLRT @ 0x012[E-]
        4881FEF0D8FFFF       cmp      rsi, -0x2710
        7412                 je       SHORT G_M30057_IG04

Due to bullet 3 above we now also report both of these back to the EE:

 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs 0x0000 : 0x00000008 ( STACK_EMPTY )
+IL offs 0x000A : 0x0000001A ( STACK_EMPTY )
 IL offs 0x0012 : 0x0000001A ( STACK_EMPTY )
 IL offs 0x001B : 0x00000023 ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000035 ( STACK_EMPTY )
 IL offs EPILOG : 0x00000035 ( STACK_EMPTY )
 IL offs 0x002A : 0x0000003B ( STACK_EMPTY )

The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think).

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas

* Propagate debug info in loop cloning * Do not consume debug info on the standalone call statement created for inline candidates. This would lead to the loss of debugging information for failed inline candidates, since those statements are dropped and expanded in the upcoming GT_RET_EXPR node instead. In some cases it would also lead to the loss of debugging information for successful inlines. In the new logic we allow the same debugging information to be attached to the upcoming statement using the GT_RET_EXPR. This change adds around 40 KB (~0.5%) to SPC.

…->IL mappings Currently we have some pretty convoluted logic that tries to ensure we only generate multiple IP mappings mapping to the same native offset in rare cases. Otherwise we try to remove one of the mappings so that we end up with only one IL offset for each native offset. In optimized code it is common that we have multiple IL offsets mapping to the same native code address, for example when a statement does not end up generating any native code (e.g. GT_ASG between two locals in the same register). The loss of one of the mappings leads to a poor debugging experience in those cases, where breakpoints fail to bind completely depending on which statement is breakpointed. This changes the debug mappings logic so that we only discard mappings for the same native offset when one of the mappings does not map to any IL offset, or when both mappings map to literally the same IL offset. This should not affect code when compiled without optimizations: there we emit nops when we encounter subsequent IL offsets without any intermediate generated native code.

ghost · 2021-11-19T16:29:13Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:

Propagate debug info in loop cloning
Do not consume debug info on the standalone call statement created for
inline candidates. This would lead to the complete loss of debugging
information for failed inline candidates, since those statements are
dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
cases it would also lead to the loss of debugging information for
successful inlines.
In the new logic we allow the same debugging information to be
attached to the upcoming statement using the GT_RET_EXPR.
Allow generating more native<->IL mappings mapping to native offsets.
Currently we have some pretty convoluted logic that tries to ensure we
only generate multiple IP mappings mapping to the same native offset in
rare cases. Otherwise we try to remove one of the mappings so that we
end up with only one IL offset for each native offset.

In optimized code it is common that we have multiple IL offsets mapping
to the same native code address, for example when a statement does not
end up generating any native code (e.g. GT_ASG between two locals in the
same register). The loss of one of the mappings leads to a poor
debugging experience in those cases, where breakpoints fail to bind
completely depending on which statement is breakpointed.

This changes the debug mappings logic so that we only discard mappings
for the same native offset when one of the mappings does not map to any
IL offset, or when both mappings map to literally the same IL offset.
This should not affect code when compiled without optimizations: there
we emit nops when we encounter subsequent IL offsets without any
intermediate generated native code.

I have also removed some dead code in the debugger as part of this.

Example with inline candidate that fails to inline

https://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4
Previously we lost all debug information here due to bullet 2 above.

Example with inline candidate that succeeds

inlined.mp4

Previously one could not breakpoint the long matchTimeoutTicks = matchTimeout.Ticks; statement because it is inlined and the assignment ends up generating no code (matchTimeoutTicks is allocated into rsi where matchTimeout already is after the prolog). We now properly get the debug information after fixing bullet 2 above:

+                            ; INLRT @ 0x00A[E-]
                             ; INLRT @ 0x012[E-]
        4881FEF0D8FFFF       cmp      rsi, -0x2710
        7412                 je       SHORT G_M30057_IG04

Due to bullet 3 above we now also report both of these back to the EE:

 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs 0x0000 : 0x00000008 ( STACK_EMPTY )
+IL offs 0x000A : 0x0000001A ( STACK_EMPTY )
 IL offs 0x0012 : 0x0000001A ( STACK_EMPTY )
 IL offs 0x001B : 0x00000023 ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000035 ( STACK_EMPTY )
 IL offs EPILOG : 0x00000035 ( STACK_EMPTY )
 IL offs 0x002A : 0x0000003B ( STACK_EMPTY )

The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think).

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jkotas · 2021-11-19T16:38:01Z

The loss of one of the mappings leads to a poor debugging experience in those cases, where breakpoints fail to bind completely depending on which statement is breakpointed.

Should this be rather fixed in the debugger?

AndyAyersMS · 2021-11-19T17:41:56Z

Should this be rather fixed in the debugger?

I'm guessing that would be hard or impossible as the jit is not conveying enough information.

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB).

All this location info gets somewhat compressed (delta encoded / nibble stream). Seems like it could be compressed down a lot further but not sure what the right tradeoffs are or what all would need to change to compensate.

jkotas · 2021-11-19T18:06:02Z

I'm guessing that would be hard or impossible as the jit is not conveying enough information.

Can the debugger snap to the first mapping it finds by going back in the code stream?

jakobbotsch · 2021-11-19T21:08:15Z

Can the debugger snap to the first mapping it finds by going back in the code stream?

I think if we go this way we definitely need to fix what the JIT is reporting as well. Currently the JIT will prefer to report the highest IL offset, except when the lowest offset is the beginning of an instruction group, in which case it will report the lowest offset. I suppose we would want it to always emit the lowest IL offset if we did this.

If we have to choose between the mappings it is not clear to me which one is "semantically" the best to report here. The highest IL offset is the one that was attached to IR nodes that ended up generating actual machine code. The lowest IL should logically be before most of the code that might be 'folded in' or optimized to be unnecessary. Both the IL offsets were requested to be reported by the EE (so in an ideal world, maybe the filtering logic ought to be in the EE).

Maybe we should do both things (report multiple mappings, change the debugger to snap breakpoints back)? I checked and it looks like reporting the multiple mappings is not all that expensive, it accounts for ~5 KB of the size increase seen in this PR.

Otherwise I experimented a little with it by not bailing in the first case here:

runtime/src/coreclr/debug/ee/controller.cpp

Lines 1894 to 1914 in 4e2d3a6

    
           // We special case offset 0, which is when a breakpoint is set 
        
           // at the beginning of a method that hasn't been jitted yet.  In 
        
           // that case it's possible that offset 0 has been optimized out, 
        
           // but we still want to set the closest breakpoint to that. 
        
           if (!fExact && (masterILOffset != 0)) 
        
           { 
        
               LOG((LF_CORDB, LL_INFO10000, "DC::BP:Failed to bind patch at IL offset 0x%p in %s::%s\n", 
        
                   masterILOffset, dji->m_nativeCodeVersion.GetMethodDesc()->m_pszDebugClassName, dji->m_nativeCodeVersion.GetMethodDesc()->m_pszDebugMethodName)); 
        
               continue; 
        
           } 
        
           else 
        
           { 
        
               result = TRUE; 
        
           } 
        
           INDEBUG(BOOL fOk = ) 
        
               AddBindAndActivatePatchForMethodDesc(dji->m_nativeCodeVersion.GetMethodDesc(), dji, 
        
                   offsetNative, PATCH_KIND_IL_SLAVE, 
        
                   LEAF_MOST_FRAME, m_pAppDomain); 
        
           _ASSERTE(fOk);

It seems to work ok, at least when JIT picks picks the earliest IL offset to report. I suppose there may be situations where we could end up breakpointing an entirely different basic block (or even worse, different funclet) if the data reported by the JIT is too sparse, however.

jakobbotsch · 2021-11-19T21:09:20Z

Also looks like there are a few issues to iron out in any case, the failing tests involve line numbers in exceptions, so seems caused by this change.

jakobbotsch · 2021-11-20T16:45:19Z

The test failure looks to be in debug mode. The new logic eliminates a mapping that the old logic did not:

-	IP mapping count : 24
+	IP mapping count : 23
 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs NO_MAP : 0x0000002F ( STACK_EMPTY )
 IL offs 0x0000 : 0x0000003D ( STACK_EMPTY )
 IL offs 0x0001 : 0x0000003E ( STACK_EMPTY )
 IL offs 0x0002 : 0x0000003F ( STACK_EMPTY )
 IL offs 0x0010 : 0x00000072 ( CALL_INSTRUCTION )
 IL offs 0x0015 : 0x00000077
-IL offs 0x001A : 0x0000008B ( STACK_EMPTY )
 IL offs 0x001A : 0x0000008B ( STACK_EMPTY CALL_INSTRUCTION )
 IL offs 0x001F : 0x00000090 ( STACK_EMPTY )
 IL offs 0x0020 : 0x00000091 ( STACK_EMPTY )
 IL offs 0x0025 : 0x0000009E ( CALL_INSTRUCTION )
 IL offs 0x002A : 0x000000A3 ( STACK_EMPTY )
 IL offs 0x002B : 0x000000A4 ( STACK_EMPTY )
 IL offs 0x002C : 0x000000A5 ( STACK_EMPTY )
 IL offs 0x0032 : 0x000000A8 ( STACK_EMPTY )
 IL offs 0x0036 : 0x000000C2 ( CALL_INSTRUCTION )
 IL offs 0x003B : 0x000000C7
 IL offs 0x0040 : 0x000000DB ( STACK_EMPTY )
 IL offs EPILOG : 0x000000DC ( STACK_EMPTY )
 IL offs PROLOG : 0x000000E4 ( STACK_EMPTY )
 IL offs 0x002E : 0x000000FB
 IL offs 0x002F : 0x000000FF ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000100 ( STACK_EMPTY )

Not sure yet why this should have any effect, might be that the correlation logic for stack traces does not check the flags as actual flags.

jakobbotsch · 2021-11-20T17:23:49Z

DebuggerInfo::SetBoundaries seems to intentionally put all mappings with a CALL_INSTRUCTION flag into its own table (I can't seem to find any uses of this table, but I suppose it is used for the "function returned X" feature?). Unclear to me why these mappings would not also be useful in other operations too.

BruceForstall · 2021-11-23T23:43:45Z

Maybe @noahfalk @dotnet/dotnet-diag might have input here.

jakobbotsch · 2021-11-24T15:28:07Z

I've extracted the first two items into a separate PR #62018 since it seems the third item will be harder to change (I've spoken a bit to @noahfalk offline about it).

ghost · 2022-01-12T17:01:01Z

Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it.

jakobbotsch added 3 commits November 19, 2021 16:47

Run jit-format

b407873

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 19, 2021

jakobbotsch mentioned this pull request Nov 24, 2021

Fix some missing debug info #62018

Merged

jakobbotsch marked this pull request as draft December 13, 2021 12:59

ghost closed this Jan 12, 2022

ghost locked as resolved and limited conversation to collaborators Feb 11, 2022

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve some cases of bad debug info propagation in optimized code #61846

Improve some cases of bad debug info propagation in optimized code #61846

Uh oh!

jakobbotsch commented Nov 19, 2021

Uh oh!

ghost commented Nov 19, 2021

Example with inline candidate that fails to inline

Example with inline candidate that succeeds

Uh oh!

jkotas commented Nov 19, 2021

Uh oh!

AndyAyersMS commented Nov 19, 2021

Uh oh!

jkotas commented Nov 19, 2021 •

edited

Loading

Uh oh!

jakobbotsch commented Nov 19, 2021

Uh oh!

jakobbotsch commented Nov 19, 2021

Uh oh!

jakobbotsch commented Nov 20, 2021 •

edited

Loading

Uh oh!

jakobbotsch commented Nov 20, 2021

Uh oh!

BruceForstall commented Nov 23, 2021

Uh oh!

jakobbotsch commented Nov 24, 2021

Uh oh!

ghost commented Jan 12, 2022

Uh oh!

Uh oh!

Improve some cases of bad debug info propagation in optimized code #61846

Improve some cases of bad debug info propagation in optimized code #61846

Uh oh!

Conversation

jakobbotsch commented Nov 19, 2021

Example with inline candidate that fails to inline

Example with inline candidate that succeeds

Uh oh!

ghost commented Nov 19, 2021

Example with inline candidate that fails to inline

Example with inline candidate that succeeds

Uh oh!

jkotas commented Nov 19, 2021

Uh oh!

AndyAyersMS commented Nov 19, 2021

Uh oh!

jkotas commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch commented Nov 19, 2021

Uh oh!

jakobbotsch commented Nov 19, 2021

Uh oh!

jakobbotsch commented Nov 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch commented Nov 20, 2021

Uh oh!

BruceForstall commented Nov 23, 2021

Uh oh!

jakobbotsch commented Nov 24, 2021

Uh oh!

ghost commented Jan 12, 2022

Uh oh!

Uh oh!

jkotas commented Nov 19, 2021 •

edited

Loading

jakobbotsch commented Nov 20, 2021 •

edited

Loading