Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious hang in Monarch::_getMostRecentPeasantID #11215

Closed
zadjii-msft opened this issue Sep 13, 2021 · 5 comments · Fixed by #11217
Closed

Mysterious hang in Monarch::_getMostRecentPeasantID #11215

zadjii-msft opened this issue Sep 13, 2021 · 5 comments · Fixed by #11217
Assignees
Labels
Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. Severity-Blocking We won't ship a release like this! No-siree.

Comments

@zadjii-msft
Copy link
Member

zadjii-msft commented Sep 13, 2021

version 1.12.2521.0.
Unsure how I got into this state. Probably

  • open windows 1, 2, 3. 1 is not the MRU window.
  • Close 2 and 3.
  • Try globalSummon the window. Terminal is in an infinite loop now inside Monarch::_getMostRecentPeasantID

but my repo's in the middle of a weird merge ATM so I can't verify on main

The symbols for this build don't include accurate source info, or at least my windbg is fucky. But what seems to be happening is _getMostRecentPeasantID is in the while (_mruPeasants.cbegin() + positionInList < _mruPeasants.cend()) loop. We're looking for ID=3, which we don't find in the list of peasants, so _getPeasant ends up returning null. We determine that we should have cleaned out that peasant ID from the MRU entries at that point, so we try again. However, we didn't. The MRU list still contains 3 at index 0, so we go looking for it again.

auto maybeThePeasant = peasantSearch == _peasants.end() ? nullptr : peasantSearch->second; in Monarch::_getPeasant ends up setting maybeThePeasant to null, which we then happily return (without cleaning up any MRU entries for it).

absolutely baseless list of possible commits to investigate:

I did not have the tray icon enabled.

Filing on myself so I don't lose this, but I don't think I'm getting anything more out of this debugger

There's no valuable logs other than a million Monarch_Collect_WasDead(Id=3, Desktop={guid})

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Sep 13, 2021
@zadjii-msft zadjii-msft added Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. Product-Terminal The new Windows Terminal. labels Sep 13, 2021
@zadjii-msft zadjii-msft added this to the Terminal v1.12 milestone Sep 13, 2021
@zadjii-msft zadjii-msft self-assigned this Sep 13, 2021
@zadjii-msft
Copy link
Member Author

Note to self: those repro steps did repro this on 1.12.2521, so this has got to be fairly new. This should be easy enough to write a test for.

@zadjii-msft zadjii-msft added Priority-1 A description (P1) Severity-Blocking We won't ship a release like this! No-siree. Issue-Bug It either shouldn't be doing this or needs an investigation. labels Sep 13, 2021
@Rosefield
Copy link
Contributor

Rosefield commented Sep 13, 2021

Curious if this is would be fixed in #11189. You of course collected the list of changes to the monarch I made, mostly around adding extra events on open/close of a peasant that can cause some churn. I didn't touch _getMostRecentPeasantId or _getPeasant so it might just be heap corruption from simultaneously removing peasants and iterating.

@Rosefield
Copy link
Contributor

Rosefield commented Sep 13, 2021

Looking at the code again, the issue might be that in SignalClose I remove from the peasants map, but don't remove from _mruPeasants? Happy to fix this either on its own, or in the above PR with the other changes.

https://github.com/microsoft/terminal/blob/main/src/cascadia/Remoting/Monarch.cpp#L148

@ghost ghost added the In-PR This issue has a related PR label Sep 13, 2021
@zadjii-msft
Copy link
Member Author

See, I knew I could file this, go to lunch, and come back to an answer 😝 Thanks for helping ID this. I'm okay combining this into #11189, so long as we make sure to add a test for this specific case. I just want to have as many tests on this stuff as possible.

Of course I type like a 2nd grader so while I was typing this, you filed the PR 😅. Doing it separately works too!

@ghost ghost closed this as completed in #11217 Sep 14, 2021
@ghost ghost added Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Sep 14, 2021
ghost pushed a commit that referenced this issue Sep 14, 2021
Fix infinite loop when trying to summon a window after close.

In #10972 code was added to try to clean up state manually when a window
was closed instead of waiting for it to be detected as a dead peasant.
Unfortunately I didn't know any better and missed cleaning up
`_mruPeasants` as well. The result is there  would be an infinite loop
in `_getMostRecentPeasant` since `_getPeasant` will only clean up ids if
it finds a peasant, not if it doesn't find anything. This is the minimal
change to get this working, but it might be a good idea to make
`_getPeasant` be more thorough about cleanup.

## Validation Steps Performed
Tested that before the change we infinitely loop, and after the change
we summon correctly.

Closes #11215
@ghost
Copy link

ghost commented Oct 20, 2021

🎉This issue was addressed in #11217, which has now been successfully released as Windows Terminal Preview v1.12.2922.0.:tada:

Handy links:

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Remoting Communication layer between windows. Often for windowing behavior, quake mode, etc. Issue-Bug It either shouldn't be doing this or needs an investigation. Needs-Tag-Fix Doesn't match tag requirements Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Priority-1 A description (P1) Product-Terminal The new Windows Terminal. Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. Severity-Blocking We won't ship a release like this! No-siree.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants