Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass inbound handoff message via heap so it cannot race out of scope by the time it reaches the ConsoleIoThread #10751

Merged
2 commits merged into from
Jul 22, 2021

Conversation

miniksa
Copy link
Member

@miniksa miniksa commented Jul 21, 2021

Pass inbound handoff message via heap so it cannot race out of scope by the time it reaches the ConsoleIoThread

PR Checklist

Detailed Description of the Pull Request / Additional comments

  • OpenConsole.exe is started in response to the OS conhost.exe request for a handoff and prepares an Out Of Proc Multithreaded COM server.
  • A COM thread from the pool inside OpenConsole.exe picks up the inbound message and allocates some stack space for the CONSOLE_API_MSG coming in
  • That COM thread calls down to set up the I/O thread that will pump the console driver handle and passes a pointer to the stack-allocated CONSOLE_API_MSG as the LPVOID parameter for starting the thread.

Now one of two things happen:

  1. The I/O thread is scheduled pretty much immediately (or soon enough that the COM thread hasn't messed with the stack space), picks up the pointer to the COM thread's stack with CONSOLE_API_MSG, and processes the initial message correctly.
  2. The COM thread continues and finalizes the handoff message to conhost.exe declaring success. It then pops stack and "frees" the memory space. If it doesn't manage to overwrite it, we're still good. If it does, then things go crazy.

This fix changes it so that the CONSOLE_API_MSG is sent into the heap before being passed to the other thread so it's in a known location that won't be freed or overwritten unexpectedly.

Validation Steps Performed

  • - Confirmed that many handoffs from the run box seem to work alright on my system after this change.
  • - Confirmed that many tab creations/splits seem to work alright on my system after this change.
  • - Would prefer if @ianjoneill could try to F5 this branch to build/deploy it, set it as default, and see if it makes it go away completely... but I'm pretty confident it is this based on the dumps provided either way.

@ghost ghost added Area-DefApp Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. labels Jul 21, 2021
// free the heap memory when we're done getting the important bits out of it below.
std::unique_ptr<CONSOLE_API_MSG> capturedMessage{ static_cast<PCONSOLE_API_MSG>(lpParameter) };

ReceiveMsg = *capturedMessage.get();
ReceiveMsg._pApiRoutines = &globals.api;
ReceiveMsg._pDeviceComm = globals.pDeviceComm;
IoSorter::ServiceIoOperation(&ReceiveMsg, &ReplyMsg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, can we just patch capturedMessage directly and hand it off as a pointer instead of copying it?

Suggested change
IoSorter::ServiceIoOperation(&ReceiveMsg, &ReplyMsg);
IoSorter::ServiceIoOperation(capturedMessage.get(), &ReplyMsg);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the leftover stack state in ReceiveMsg and ReplyMsg when the dump I was given was hung later on at HRESULT hr = ServiceLocator::LocateGlobals().pDeviceComm->ReadIo(ReplyMsg, &ReceiveMsg); was what lead me to understand this bug to begin with.

As such, I tried to maintain keeping the "most recent message before hang" in the local variables. If I use the heap one directly, it'll disappear to space if I get another similar hang dump and I'll only have the reply to go on. Additionally, the api and devicecomm bits need to be patched up in ReceiveMsg anyway, no matter what, as they're reused message over message.

Copy link
Contributor

@ianjoneill ianjoneill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - tried about 20 times and it didn't fail 🎉

@zadjii-msft zadjii-msft added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jul 22, 2021
@ghost
Copy link

ghost commented Jul 22, 2021

Hello @zadjii-msft!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 41ade2c into main Jul 22, 2021
@ghost ghost deleted the dev/miniksa/flaky_handoff branch July 22, 2021 12:51
Rosefield pushed a commit to Rosefield/terminal that referenced this pull request Jul 22, 2021
…by the time it reaches the ConsoleIoThread (microsoft#10751)

Pass inbound handoff message via heap so it cannot race out of scope by the time it reaches the ConsoleIoThread

## PR Checklist
* [x] Closes microsoft#10251
* [x] I work here.
* [x] Manually verified somewhat

## Detailed Description of the Pull Request / Additional comments
- `OpenConsole.exe` is started in response to the OS `conhost.exe` request for a handoff and prepares an Out Of Proc Multithreaded COM server.
- A COM thread from the pool inside `OpenConsole.exe` picks up the inbound message and allocates some stack space for the `CONSOLE_API_MSG` coming in
- That COM thread calls down to set up the I/O thread that will pump the console driver handle and passes a pointer to the stack-allocated `CONSOLE_API_MSG` as the `LPVOID` parameter for starting the thread.

Now one of two things happen:
1. The I/O thread is scheduled pretty much immediately (or soon enough that the COM thread hasn't messed with the stack space), picks up the pointer to the COM thread's stack with `CONSOLE_API_MSG`, and processes the initial message correctly.
2. The COM thread continues and finalizes the handoff message to `conhost.exe` declaring success. It then pops stack and "frees" the memory space. If it doesn't manage to overwrite it, we're still good. If it does, then things go crazy.

This fix changes it so that the `CONSOLE_API_MSG` is sent into the heap before being passed to the other thread so it's in a known location that won't be freed or overwritten unexpectedly.

## Validation Steps Performed
- [x] - Confirmed that many handoffs from the run box seem to work alright on my system after this change.
- [x] - Confirmed that many tab creations/splits seem to work alright on my system after this change.
- [x] - Would prefer if @ianjoneill could try to F5 this branch to build/deploy it, set it as default, and see if it makes it go away completely... but I'm pretty confident it is this based on the dumps provided either way.
DHowett pushed a commit that referenced this pull request Aug 25, 2021
…by the time it reaches the ConsoleIoThread (#10751)

Pass inbound handoff message via heap so it cannot race out of scope by the time it reaches the ConsoleIoThread

## PR Checklist
* [x] Closes #10251
* [x] I work here.
* [x] Manually verified somewhat

## Detailed Description of the Pull Request / Additional comments
- `OpenConsole.exe` is started in response to the OS `conhost.exe` request for a handoff and prepares an Out Of Proc Multithreaded COM server.
- A COM thread from the pool inside `OpenConsole.exe` picks up the inbound message and allocates some stack space for the `CONSOLE_API_MSG` coming in
- That COM thread calls down to set up the I/O thread that will pump the console driver handle and passes a pointer to the stack-allocated `CONSOLE_API_MSG` as the `LPVOID` parameter for starting the thread.

Now one of two things happen:
1. The I/O thread is scheduled pretty much immediately (or soon enough that the COM thread hasn't messed with the stack space), picks up the pointer to the COM thread's stack with `CONSOLE_API_MSG`, and processes the initial message correctly.
2. The COM thread continues and finalizes the handoff message to `conhost.exe` declaring success. It then pops stack and "frees" the memory space. If it doesn't manage to overwrite it, we're still good. If it does, then things go crazy.

This fix changes it so that the `CONSOLE_API_MSG` is sent into the heap before being passed to the other thread so it's in a known location that won't be freed or overwritten unexpectedly.

## Validation Steps Performed
- [x] - Confirmed that many handoffs from the run box seem to work alright on my system after this change.
- [x] - Confirmed that many tab creations/splits seem to work alright on my system after this change.
- [x] - Would prefer if @ianjoneill could try to F5 this branch to build/deploy it, set it as default, and see if it makes it go away completely... but I'm pretty confident it is this based on the dumps provided either way.
@ghost
Copy link

ghost commented Aug 31, 2021

🎉Windows Terminal Preview v1.10.2383.0 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Aug 31, 2021

🎉Windows Terminal Preview v1.11.2421.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-DefApp AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DefApp] PowerShell opened via Win+X errors 50% of the time
4 participants