Skip to content

Fix exception propagation over HW exception frame on macOS arm64 #63482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Jan 6, 2022

There is a bug in setting up the fake stack frame for
the PAL_DispatchExceptionWrapper. The FP and SP were not saved
to the stack frame and the FP of the context was not set to
match the fake prologue. That caused failure to unwind over the
PAL_DispatchExceptionWrapper, reaching an unrelated native
function.

This change fixes it.

Close #62058

There is a bug in setting up the fake stack frame for
the PAL_DispatchExceptionWrapper. The FP and SP were not saved
to the stack frame and the FP of the context was not set to
match the fake prologue. That caused failure to unwind over the
PAL_DispatchExceptionWrapper, reaching an unrelated native
function.

This change fixes it.
@janvorli janvorli added this to the 6.0.x milestone Jan 6, 2022
@janvorli janvorli requested a review from jkotas January 6, 2022 23:51
@janvorli janvorli self-assigned this Jan 6, 2022
@janvorli
Copy link
Member Author

janvorli commented Jan 7, 2022

@jkotas I've added a regression test for the issue (the mini repro from the #62058)

@janvorli
Copy link
Member Author

janvorli commented Jan 7, 2022

I can see that two of the coreclr tests fail in the ci for this PR on osx arm64. I am investigating if that's related to this change.

@janvorli
Copy link
Member Author

janvorli commented Jan 7, 2022

The fix is actually wrong. We were already storing the fp / lr on the stack a bit above the place I've added. The only thing was that instead of LR, we were putting there PC. Which was actually correct for everything except leaf functions (the issue #62058 has the exception in a leaf function, the VSD stub). Well, even for non-leaf functions, there would be a problem if the LR was used as a general purpose register between calls.
The real problem is that with the arm64 mechanism of making calls we cannot make unwinding of injected frame to get LR after the unwind to be different from the PC. The actual fix will need to do the same what we do for Linux - to make PAL_VirtualUnwind to recognize the PAL_DispatchExceptionWrapper and to extract the whole context at the failure location from the place where HijackFaultingThread has stored it.

Unifies the hardware exception frame unwinding with Linux,
it is necessary on arm64 to get correct and distinct LR and
PC in the frame of the hardware exception.
@janvorli janvorli merged commit 897694f into dotnet:main Jan 8, 2022
@janvorli janvorli deleted the fix-macos-arm64-exception-propagation-over-hw-exception-frame branch January 8, 2022 00:51
@janvorli
Copy link
Member Author

janvorli commented Jan 8, 2022

/backport to release/6.0

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2022

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1669998117

@danmoseley
Copy link
Member

@janvorli (you probably verified this already but) it does seem this broke the test, Should we open a bug?

WorkItems 
| where Started > now(-100d)  
| where FriendlyName == "System.Composition.Hosting.Tests"
| where ExitCode != 0
| where Status == "Timeout"
| join kind= inner (
   Jobs  | where Started > now(-30d) | project  QueueName , JobId, Build, Type, Source 
) on JobId
| summarize count() by ExitCode, QueueName, format_datetime(bin(Started, 1d), 'MM-dd')
| order by Started desc
ExitCode QueueName Started count_
-3 osx.1015.amd64.open 01-10 4
-3 osx.1014.amd64.open 01-10 3
-3 osx.1015.amd64.open 01-09 13
-3 osx.1014.amd64.open 01-09 19
-3 osx.1015.amd64.open 01-08 9
-3 osx.1014.amd64.open 01-08 15
-3 osx.1100.arm64.open 01-08 2
-3 osx.1013.amd64.open 01-08 1

@krwq
Copy link
Member

krwq commented Jan 10, 2022

@danmoseley I think there is already an issue: #63561

jkotas added a commit to jkotas/runtime that referenced this pull request Jan 10, 2022
@jkotas
Copy link
Member

jkotas commented Jan 10, 2022

This is affecting too many PRs. We need to revert it: #63576

jkotas added a commit that referenced this pull request Jan 10, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Feb 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnwindManagedExceptionPass1 crashes on virtual dispatch stub when handling System.NullReferenceException on macOS ARM64
4 participants