-
Notifications
You must be signed in to change notification settings - Fork 5k
[cDAC] Stack walk support more Frame types #112997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good enough to check in. I'm excited to see us start to run tests on this.
I have tests running locally on Win-x64 using my changes: dotnet/diagnostics#5350. The last two blockers on multiple platform runs in CI are linking and cross-platform The pipeline should be running soon. |
{ | ||
value = default; | ||
if (typeof(T).GetField(fieldName) is not FieldInfo field) return false; | ||
if (field.GetValue(Context) is not object fieldValue) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a weird check. It seems like the actual function of it is to check whether the field value is null, in which case a clearer way to write it would be
object fieldValue = field.GetValue(Context);
if (fieldValue == null) return false;
I was actually surprised to learn that x is object y
would evaluate to false
for null
.
It's also redundant with the ulong and uint checks below, so I think what you can actually just write is:
object fieldValue = field.GetValue(Context);
and the two ulong/uint checks below will do the rest of the work, falling through to the return false
at the bottom.
value = default; | ||
if (typeof(T).GetField(fieldName) is not FieldInfo field) return false; | ||
if (field.GetValue(Context) is not object fieldValue) return false; | ||
if (fieldValue is ulong ul && target.PointerSize == sizeof(ulong)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not strictly necessary to fix, just a thought)
do we want to communicate something if they are mismatched and fail in a loud way, i.e. 'expected uint but got ulong' or is it enough to just return false? This is 'correct' but I don't know if this is a good opportunity for us to "fail well" by properly detecting and messaging error scenarios at this layer instead of just return
ing a false
that conveys no information about the nature of the failure. I'm assuming an end user is touching this code through 20 layers of abstraction, and if this method returns false because the field held a ulong
instead of a uint
they might really appreciate it if there were an appropriate exception or error handler for that scenario.
I would assume the only way we can reach the return false
at the bottom is if the target and the context are somehow mismatched, like the wrong dll was loaded. That's a scenario where we would want to fail fast, I think, instead of just silently not working?
On the other hand, this is robust, so I have no problem with it being checked in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that we usually want to fail fast. Currently that happens in the caller, it could be moved in here but I'm not sure on other scenarios if that's always the behavior we want.
if (other is null) | ||
{ | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, you can eliminate this whole check by changing the if
below to if (GetType() != other?.GetType())
- a null
other
will have a null
other?.GetType()
which can't ever match our own GetType()
.
Change not necessary though.
Is there a reason why this Equals
implementation is written in this specific way (exact type match only) even though ContextHolder
isn't sealed
? It's a little weird to see it, and I'd prefer to block this class of failure by sealing the class if we can do that because then the attempt to do the wrong thing will fail robustly at compile time instead of silently failing Equals
checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change this class to sealed
} | ||
catch (InvalidOperationException) | ||
{ | ||
// if the read fails, the unwinder behavior changes. Return failing HR instead of throwing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specifically make sure we're only swallowing the type of IOEs we expect and not other IOEs from an actual bug in the implementation? Do we know what scenarios we expect a throw here in? It's fine if there's no way to sense what kind of IOE it is robustly (I don't expect you to parse .Message etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This throw occurs when the global is not found.
It happens:
Lines 514 to 521 in 6a23f1c
public TargetPointer ReadGlobalPointer(string name, out string? type) | |
{ | |
if (!_globals.TryGetValue(name, out (ulong Value, string? Type) global)) | |
throw new InvalidOperationException($"Failed to read global pointer '{name}'."); | |
type = global.Type; | |
return new TargetPointer(global.Value); | |
} |
I agree that I don't like using exceptions to handle this scenario. It may be worth changing the Target API to include a TryReadGlobalPointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, given the narrow scope of the code inside the try I'm fine with this for now, maybe just make a to-do note to clean it up later to not rely on exceptions.
@@ -109,7 +109,15 @@ private static unsafe int ReadFromTarget(ulong address, void* pBuffer, int buffe | |||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Existing code, so not strictly your problem)
This GCHandle.FromIntPtr
on context
makes me suspicious, doubly so that we're checking whether the target is some type other than what we expect. Are we trying to guard against garbage context pointers here? Or something else? A comment might be good saying what this if statement is solving for people who look at the code later, if you happen to know. If it's equally mysterious to you then you can put it out of your mind and resolve this comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the existing code it looks like this is an abstract unwinder native API that lets you smuggle arbitrary data through void* context
, so since we're providing an IntPtr when calling it elsewhere in this file, it's probably fine? But in that case, I'd like to see us fail fast and fail loudly - we shouldn't return -1, we should probably bail out and throw an exception because somehow we passed ourselves something other than the context we expected. Maybe I'm not understanding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No totally reasonable. I also wrote that code.
This is guarding against memory corruption / an invalid call. The intptr should always be a GCHandle with Target ContextCallback. I can add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. If we're concerned about memory corruption, I'm not certain that it's safe to turn arbitrary bytes into a GCHandle and dereference it. Hopefully we bounds-check the table lookup? I'll see if I can find out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCHandle.ThrowIfInvalid
throws for 0
, so a null
context
would make us throw here instead of returning -1
.
GCHandleInternalGet
does not appear to do any verification or bounds checking. It calls ObjectFromHandle
, which also does not appear to do any verification or bounds checking. So I think if we got something here that wasn't a GCHandle, Anything Could Happen. Someone who knows the VM better could correct me here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the issues is that this is a callback from a native code boundary. Therefore, we probably shouldn't throw because on non-Windows platforms that's a fail fast. Ideally, we'd return an error to the native code which would propagate that error back to the managed invoking function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like the right approach is to try and verify the handle in a way that allows you to quickly return an error code to the native caller, without throwing. Right now if the handle is 0, ThrowIfInvalid
will throw and unwind the stack, which it sounds like we don't want.
One workaround would be to use a thread local stack of some sort to record valid handles so this function can look up the context in the stack before dereferencing it. That's complex and not speedy though, so I don't love it.
Another option would be to try and ensure - by construction - that this callback can't be invoked by anyone else, and then we know that the argument(s) are trusted. I'm not sure we can do that here though.
{ | ||
ContextHolder<AMD64Context> contextHolder => new AMD64FrameHandler(target, contextHolder), | ||
ContextHolder<ARM64Context> contextHolder => new ARM64FrameHandler(target, contextHolder), | ||
_ => throw new InvalidOperationException("Unsupported context type"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to say which type is unsupported, it will make failures in production more actionable. Not strictly necessary though.
Builds on #111759
Contributes to #110758
Adds cDAC stack walking support for the following Frame types:
TransitionFrame
(Base class)FramedMethodFrame
CLRToCOMMethodFrame
PInvokeCalliFrame
StubDispatchFrame
CallCountingHelperFrame
ExternalMethodFrame
DynamicHelperFrame
FuncEvalFrame
ResumableFrame
(Base class)RedirectedThreadFrame
FaultingExceptionFrame
HijackFrame
Expands cDAC Frame handling to be platform specific
In line with current cDAC stack walking support, only AMD64 and ARM64 support was added. When bringing up support for other architectures, platform specific frame handlers will be required. This was done to exactly match the existing DAC stack walking behavior.
Callouts
FaulingExceptionFrame
's stores aT_CONTEXT
directly.ResumableFrame
's store a pointer to aT_CONTEXT
(PT_CONTEXT
). To make this difference clear I named the direct struct asTargetContext
and the pointer asTargetContextPtr
. This distinction is important when reading the data descriptors. The former is a direct offset from the Frame's address, the latter is a pointer read at an offset from the Frame's address.TargetContext
andTargetContextPtr
) end up pointing to the beginning of aT_CONTEXT
.CalleeSavedRegisters
andHijackArgs
) store a bag of registers values. To prevent hard coding the number/names of these register value, I have opted to use reflection to write the corresponding struct values from their string names.IPlatformContext
to avoid reflection.CalleeSavedRegisters
/HijackArgs
<arch>datadescriptor.h
that is imported based on platform.Testing
Tested using amd64 Windows. Verified byte for byte correctness with DAC stack walking.
How produce call stacks with uncommon Frames
FuncEvalFrame
FuncEvalFrame's are generated during debugger immediate execution.
Debug any program in VS. Break and call a blocking function (
Console.ReadLine()
) attach a debugger non-invasively to stack walk or create a dump.For example, using cdb,
cdb -pv -p <processId>
HijackFrame
RedirectedThreadFrame (ResumableFrame)
FaultingExceptionFrame