Skip to content

Improve some cases of bad debug info propagation in optimized code #61846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jakobbotsch
Copy link
Member

This addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:

  1. Propagate debug info in loop cloning
  2. Do not consume debug info on the standalone call statement created for
    inline candidates. This would lead to the complete loss of debugging
    information for failed inline candidates, since those statements are
    dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
    cases it would also lead to the loss of debugging information for
    successful inlines.
    In the new logic we allow the same debugging information to be
    attached to the upcoming statement using the GT_RET_EXPR.
  3. Allow generating more native<->IL mappings mapping to native offsets.
    Currently we have some pretty convoluted logic that tries to ensure we
    only generate multiple IP mappings mapping to the same native offset in
    rare cases. Otherwise we try to remove one of the mappings so that we
    end up with only one IL offset for each native offset.

    In optimized code it is common that we have multiple IL offsets mapping
    to the same native code address, for example when a statement does not
    end up generating any native code (e.g. GT_ASG between two locals in the
    same register). The loss of one of the mappings leads to a poor
    debugging experience in those cases, where breakpoints fail to bind
    completely depending on which statement is breakpointed.

    This changes the debug mappings logic so that we only discard mappings
    for the same native offset when one of the mappings does not map to any
    IL offset, or when both mappings map to literally the same IL offset.
    This should not affect code when compiled without optimizations: there
    we emit nops when we encounter subsequent IL offsets without any
    intermediate generated native code.

    I have also removed some dead code in the debugger as part of this.

Example with inline candidate that fails to inline

https://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4
Previously we lost all debug information here due to bullet 2 above.

Example with inline candidate that succeeds

inlined.mp4

Previously one could not breakpoint the long matchTimeoutTicks = matchTimeout.Ticks; statement because it is inlined and the assignment ends up generating no code (matchTimeoutTicks is allocated into rsi where matchTimeout already is after the prolog). We now properly get the debug information after fixing bullet 2 above:

+                            ; INLRT @ 0x00A[E-]
                             ; INLRT @ 0x012[E-]
        4881FEF0D8FFFF       cmp      rsi, -0x2710
        7412                 je       SHORT G_M30057_IG04

Due to bullet 3 above we now also report both of these back to the EE:

 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs 0x0000 : 0x00000008 ( STACK_EMPTY )
+IL offs 0x000A : 0x0000001A ( STACK_EMPTY )
 IL offs 0x0012 : 0x0000001A ( STACK_EMPTY )
 IL offs 0x001B : 0x00000023 ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000035 ( STACK_EMPTY )
 IL offs EPILOG : 0x00000035 ( STACK_EMPTY )
 IL offs 0x002A : 0x0000003B ( STACK_EMPTY )

The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think).

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas

* Propagate debug info in loop cloning

* Do not consume debug info on the standalone call statement created for
  inline candidates. This would lead to the loss of debugging
  information for failed inline candidates, since those statements are
  dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
  cases it would also lead to the loss of debugging information for
  successful inlines.
  In the new logic we allow the same debugging information to be
  attached to the upcoming statement using the GT_RET_EXPR.

This change adds around 40 KB (~0.5%) to SPC.
…->IL mappings

Currently we have some pretty convoluted logic that tries to ensure we
only generate multiple IP mappings mapping to the same native offset in
rare cases. Otherwise we try to remove one of the mappings so that we
end up with only one IL offset for each native offset.

In optimized code it is common that we have multiple IL offsets mapping
to the same native code address, for example when a statement does not
end up generating any native code (e.g. GT_ASG between two locals in the
same register). The loss of one of the mappings leads to a poor
debugging experience in those cases, where breakpoints fail to bind
completely depending on which statement is breakpointed.

This changes the debug mappings logic so that we only discard mappings
for the same native offset when one of the mappings does not map to any
IL offset, or when both mappings map to literally the same IL offset.

This should not affect code when compiled without optimizations: there
we emit nops when we encounter subsequent IL offsets without any
intermediate generated native code.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 19, 2021
@ghost
Copy link

ghost commented Nov 19, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This addresses some cases where we lost track of debugging information and also propagates this back to the EE more aggressively for optimized code. In particular:

  1. Propagate debug info in loop cloning
  2. Do not consume debug info on the standalone call statement created for
    inline candidates. This would lead to the complete loss of debugging
    information for failed inline candidates, since those statements are
    dropped and expanded in the upcoming GT_RET_EXPR node instead. In some
    cases it would also lead to the loss of debugging information for
    successful inlines.
    In the new logic we allow the same debugging information to be
    attached to the upcoming statement using the GT_RET_EXPR.
  3. Allow generating more native<->IL mappings mapping to native offsets.
    Currently we have some pretty convoluted logic that tries to ensure we
    only generate multiple IP mappings mapping to the same native offset in
    rare cases. Otherwise we try to remove one of the mappings so that we
    end up with only one IL offset for each native offset.

    In optimized code it is common that we have multiple IL offsets mapping
    to the same native code address, for example when a statement does not
    end up generating any native code (e.g. GT_ASG between two locals in the
    same register). The loss of one of the mappings leads to a poor
    debugging experience in those cases, where breakpoints fail to bind
    completely depending on which statement is breakpointed.

    This changes the debug mappings logic so that we only discard mappings
    for the same native offset when one of the mappings does not map to any
    IL offset, or when both mappings map to literally the same IL offset.
    This should not affect code when compiled without optimizations: there
    we emit nops when we encounter subsequent IL offsets without any
    intermediate generated native code.

    I have also removed some dead code in the debugger as part of this.

Example with inline candidate that fails to inline

https://user-images.githubusercontent.com/7887810/142654263-c44b6d2a-0b75-4426-814f-17c9efadbc87.mp4
Previously we lost all debug information here due to bullet 2 above.

Example with inline candidate that succeeds

inlined.mp4

Previously one could not breakpoint the long matchTimeoutTicks = matchTimeout.Ticks; statement because it is inlined and the assignment ends up generating no code (matchTimeoutTicks is allocated into rsi where matchTimeout already is after the prolog). We now properly get the debug information after fixing bullet 2 above:

+                            ; INLRT @ 0x00A[E-]
                             ; INLRT @ 0x012[E-]
        4881FEF0D8FFFF       cmp      rsi, -0x2710
        7412                 je       SHORT G_M30057_IG04

Due to bullet 3 above we now also report both of these back to the EE:

 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs 0x0000 : 0x00000008 ( STACK_EMPTY )
+IL offs 0x000A : 0x0000001A ( STACK_EMPTY )
 IL offs 0x0012 : 0x0000001A ( STACK_EMPTY )
 IL offs 0x001B : 0x00000023 ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000035 ( STACK_EMPTY )
 IL offs EPILOG : 0x00000035 ( STACK_EMPTY )
 IL offs 0x002A : 0x0000003B ( STACK_EMPTY )

The debugger seems to handle it ok -- we can breakpoint both lines as seen in the video, and it always shows the latest line mapping in VS (slightly confusing, but better than previous behavior I think).

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB). The vast majority of this comes from bullet 2 above. Any concerns about this? cc @dotnet/jit-contrib @jkotas

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

The loss of one of the mappings leads to a poor debugging experience in those cases, where breakpoints fail to bind completely depending on which statement is breakpointed.

Should this be rather fixed in the debugger?

@AndyAyersMS
Copy link
Member

Should this be rather fixed in the debugger?

I'm guessing that would be hard or impossible as the jit is not conveying enough information.

Overall this adds ~0.5% to SPC (11.850 KB -> 11.898 KB).

All this location info gets somewhat compressed (delta encoded / nibble stream). Seems like it could be compressed down a lot further but not sure what the right tradeoffs are or what all would need to change to compensate.

@jkotas
Copy link
Member

jkotas commented Nov 19, 2021

I'm guessing that would be hard or impossible as the jit is not conveying enough information.

Can the debugger snap to the first mapping it finds by going back in the code stream?

@jakobbotsch
Copy link
Member Author

Can the debugger snap to the first mapping it finds by going back in the code stream?

I think if we go this way we definitely need to fix what the JIT is reporting as well. Currently the JIT will prefer to report the highest IL offset, except when the lowest offset is the beginning of an instruction group, in which case it will report the lowest offset. I suppose we would want it to always emit the lowest IL offset if we did this.

If we have to choose between the mappings it is not clear to me which one is "semantically" the best to report here. The highest IL offset is the one that was attached to IR nodes that ended up generating actual machine code. The lowest IL should logically be before most of the code that might be 'folded in' or optimized to be unnecessary. Both the IL offsets were requested to be reported by the EE (so in an ideal world, maybe the filtering logic ought to be in the EE).

Maybe we should do both things (report multiple mappings, change the debugger to snap breakpoints back)? I checked and it looks like reporting the multiple mappings is not all that expensive, it accounts for ~5 KB of the size increase seen in this PR.

Otherwise I experimented a little with it by not bailing in the first case here:

// We special case offset 0, which is when a breakpoint is set
// at the beginning of a method that hasn't been jitted yet. In
// that case it's possible that offset 0 has been optimized out,
// but we still want to set the closest breakpoint to that.
if (!fExact && (masterILOffset != 0))
{
LOG((LF_CORDB, LL_INFO10000, "DC::BP:Failed to bind patch at IL offset 0x%p in %s::%s\n",
masterILOffset, dji->m_nativeCodeVersion.GetMethodDesc()->m_pszDebugClassName, dji->m_nativeCodeVersion.GetMethodDesc()->m_pszDebugMethodName));
continue;
}
else
{
result = TRUE;
}
INDEBUG(BOOL fOk = )
AddBindAndActivatePatchForMethodDesc(dji->m_nativeCodeVersion.GetMethodDesc(), dji,
offsetNative, PATCH_KIND_IL_SLAVE,
LEAF_MOST_FRAME, m_pAppDomain);
_ASSERTE(fOk);

It seems to work ok, at least when JIT picks picks the earliest IL offset to report. I suppose there may be situations where we could end up breakpointing an entirely different basic block (or even worse, different funclet) if the data reported by the JIT is too sparse, however.

@jakobbotsch
Copy link
Member Author

Also looks like there are a few issues to iron out in any case, the failing tests involve line numbers in exceptions, so seems caused by this change.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Nov 20, 2021

The test failure looks to be in debug mode. The new logic eliminates a mapping that the old logic did not:

-	IP mapping count : 24
+	IP mapping count : 23
 IL offs PROLOG : 0x00000000 ( STACK_EMPTY )
 IL offs NO_MAP : 0x0000002F ( STACK_EMPTY )
 IL offs 0x0000 : 0x0000003D ( STACK_EMPTY )
 IL offs 0x0001 : 0x0000003E ( STACK_EMPTY )
 IL offs 0x0002 : 0x0000003F ( STACK_EMPTY )
 IL offs 0x0010 : 0x00000072 ( CALL_INSTRUCTION )
 IL offs 0x0015 : 0x00000077
-IL offs 0x001A : 0x0000008B ( STACK_EMPTY )
 IL offs 0x001A : 0x0000008B ( STACK_EMPTY CALL_INSTRUCTION )
 IL offs 0x001F : 0x00000090 ( STACK_EMPTY )
 IL offs 0x0020 : 0x00000091 ( STACK_EMPTY )
 IL offs 0x0025 : 0x0000009E ( CALL_INSTRUCTION )
 IL offs 0x002A : 0x000000A3 ( STACK_EMPTY )
 IL offs 0x002B : 0x000000A4 ( STACK_EMPTY )
 IL offs 0x002C : 0x000000A5 ( STACK_EMPTY )
 IL offs 0x0032 : 0x000000A8 ( STACK_EMPTY )
 IL offs 0x0036 : 0x000000C2 ( CALL_INSTRUCTION )
 IL offs 0x003B : 0x000000C7
 IL offs 0x0040 : 0x000000DB ( STACK_EMPTY )
 IL offs EPILOG : 0x000000DC ( STACK_EMPTY )
 IL offs PROLOG : 0x000000E4 ( STACK_EMPTY )
 IL offs 0x002E : 0x000000FB
 IL offs 0x002F : 0x000000FF ( STACK_EMPTY )
 IL offs 0x0030 : 0x00000100 ( STACK_EMPTY )

Not sure yet why this should have any effect, might be that the correlation logic for stack traces does not check the flags as actual flags.

@jakobbotsch
Copy link
Member Author

DebuggerInfo::SetBoundaries seems to intentionally put all mappings with a CALL_INSTRUCTION flag into its own table (I can't seem to find any uses of this table, but I suppose it is used for the "function returned X" feature?). Unclear to me why these mappings would not also be useful in other operations too.

@BruceForstall
Copy link
Contributor

Maybe @noahfalk @dotnet/dotnet-diag might have input here.

@jakobbotsch
Copy link
Member Author

I've extracted the first two items into a separate PR #62018 since it seems the third item will be harder to change (I've spoken a bit to @noahfalk offline about it).

@jakobbotsch jakobbotsch marked this pull request as draft December 13, 2021 12:59
@ghost ghost closed this Jan 12, 2022
@ghost
Copy link

ghost commented Jan 12, 2022

Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 11, 2022
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants