-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fallback when server client fails due to mutex connection timeout error. #8024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback when server client fails due to mutex connection timeout error. #8024
Conversation
14f4f9f to
133463e
Compare
| /// The client is not able to identify the server state. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// This may happen when mutex that is regulating the server state throws. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add link to related GH issue to this comment.
| { | ||
| CommunicationsUtilities.Trace("Failed to obtain the current build server state: {0}", ex); | ||
| _exitResult.MSBuildClientExitType = MSBuildClientExitType.UnknownServerState; | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caller of TryLaunchServer do override _exitResult.MSBuildClientExitType of false. I recommend caller does it only if MSBuildClientExitType has not been initialized yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also Trace IOException.HResult
| { | ||
| // For unknown root cause, Mutex.TryOpenExisting can sometimes throw 'Connection timed out' exception preventing to obtain the build server state through it (Running or not, Busy or not). | ||
| // See: https://github.com/dotnet/msbuild/issues/7993 | ||
| CommunicationsUtilities.Trace("Failed to obtain the current build server state: {0}", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After communication with Jiri Vorlicek please also Trace IOException.HResult
It will help runtime team to narrow it down.
| { | ||
| CommunicationsUtilities.Trace("Failed to obtain the current build server state: {0}", ex); | ||
| _exitResult.MSBuildClientExitType = MSBuildClientExitType.UnknownServerState; | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also Trace IOException.HResult
|
I'm confused about a few things for this PR. First, is it intended to go into 17.4 or 17.5? You filled out the template as if it's aimed at 17.4, but I heard we were turning server off for 17.4, which makes this seem like it isn't high enough priority to meet the QB approval bar? Second, is this its final form? It seems more like it's trying to gather information about what's going wrong when it crashes rather than actually fixing the underlying problem. Are you planning to have a follow-up PR to fix the real issue and remove these try/catches? If so, should we remove the "fixes" line? Third, how should "UnknownServerState" be treated by the rest of MSBuild? I'm wondering if you should just shut down the server in that case and restart it with the next build rather than get into an unknown state. |
|
|
Looks like Edit: Disregard given #8116 merge. |
Fixes #7993
Summary
In the MSBuild server, we identify the state of the server via mutexes. Sometimes, for reason yet unknown to us, mutex could throw the exception
System.IO.IOException: Connection timed out. When this occurs, we fallback to old behavior building without server. We fixed some of those in #8000, but now found more situations when this happens.Customer Impact
MSBuild non-Windows users could have intermittent error when building with
dotnet build.This does not affect Visual Studio.
Regression?
Yes, this is a regression.
Testing
Unit tests.
Risk
Low risk. The fix adds additional try-catch blocks to process this situation.
Code Reviewers
[TODO]
Description of the fix
IOExceptionexception when mutexes are used.