Replies: 8 comments 8 replies
-
Do you have any more content? Or is it actually what gets printed verbatim without edits?
It is safe to assume you were always using Kestrel?
Is this the full stack trace for the exception? Do you have any async void in your code base? One common problem with websockets is bad handling of the lifetime, yielding the request before the websocket is torn down (https://docs.microsoft.com/en-us/aspnet/core/fundamentals/websockets?view=aspnetcore-3.1#accept-websocket-requests) |
Beta Was this translation helpful? Give feedback.
-
I have the following output from several runs. It is slightly filtered from journald, removing the host and service name. The lines marked with
This was the first bad occurrence with .NET Core 3.1: (The final ABRT line occurs every time.)
Here is the only occurrence with more output. At least two clients were connected at the time (user1, user2). The exceptions were logged in parallel which overcharged the logger.
The more
Yes, sorry, I didn't think of there might still be other options. The web app is started via systemd and accessed through a reverse proxy through Apache which also adds SSL.
Probably not. The logging cannot complete before the process dies. That's because I don't see what and where the real issue is.
There are some of them but all have a try/catch around their entire body and none of them should be called in this scenario. Everything else returns a [Route("api/[controller]")]
public class RmpController : Controller
{
private readonly RmpRouter router;
public async Task Index()
{
if (HttpContext.WebSockets.IsWebSocketRequest)
{
var webSocket = await HttpContext.WebSockets.AcceptWebSocketAsync();
logger.LogDebug($"RMP client connection from {HttpContext.Connection.RemoteIpAddress} opened");
try
{
await router.HandleClientConnection(HttpContext.Connection.RemoteIpAddress, webSocket, HttpContext.User, applicationLifetime.ApplicationStopping);
}
catch (Exception ex)
{
Console.WriteLine("Log (RmpController.Index): " + ex.ToString());
logger.LogError(ex, "RMP error: " + ex.GetRecursiveMessage());
}
finally
{
logger.LogDebug($"RMP client connection from {HttpContext.Connection.RemoteIpAddress} closed");
}
}
else
{
HttpContext.Response.StatusCode = 400;
}
}
} A rough simplification of the router's code is this. The router class handles a client websocket connection, receives data from it and works with it. If the router is shut down, the connection is closed by the router. This is when the trouble begins. public async Task HandleClientConnection(IPAddress remoteIpAddress, WebSocket webSocket, ClaimsPrincipal claimsPrincipal, CancellationToken cancellationToken = default)
{
try
{
cancellationTokenRegistration = cancellationToken.Register(async () =>
{
Console.WriteLine("Log (Router.HandleClientConnection): cancellationToken triggered, closing client WebSocket for user " + client.AuthenticatedUserName);
try
{
await CloseAsync(client.WebSocket, client.SendMutex, WebSocketCloseStatus.EndpointUnavailable, "Shutting down");
}
catch (Exception ex)
{
Console.WriteLine("Log (Router.HandleClientConnection): " + ex);
}
Console.WriteLine("Log (Router.HandleClientConnection): cancellationToken triggered, client WebSocket for user " + client.AuthenticatedUserName + " closed");
});
// Message loop
while (true)
{
var segment = await ReceiveRawMessage(client, webSocket);
//...
}
}
catch (OperationCanceledException ex) when (ex.CancellationToken == shutdownCts.Token || ex.CancellationToken == cancellationToken)
{
Console.WriteLine("Log (Router.HandleClientConnection 1): " + ex.ToString());
OnClientError(client, "Shutting down", null);
await CloseAsync(webSocket, client.SendMutex, WebSocketCloseStatus.EndpointUnavailable, "Shutting down");
}
catch (OperationCanceledException ex) when (ex.CancellationToken == client.AbortReceiveCts.Token)
{
Console.WriteLine("Log (Router.HandleClientConnection 2): " + ex.ToString());
OnClientError(client, "Connection aborted", null);
await CloseAsync(webSocket, client.SendMutex, WebSocketCloseStatus.EndpointUnavailable, "Connection aborted");
}
catch (Exception ex)
{
Console.WriteLine("Log (Router.HandleClientConnection 3): " + ex.ToString());
OnClientError(client, ex.Message, ex);
await CloseAsync(webSocket, client.SendMutex, WebSocketCloseStatus.InternalServerError, ex.Message);
}
finally
{
cancellationTokenRegistration.Dispose();
try
{
await CloseAsync(webSocket, client.SendMutex, WebSocketCloseStatus.NormalClosure, "");
}
catch (Exception ex)
{
Console.WriteLine("Log (Router.HandleClientConnection finally): " + ex.ToString());
OnClientError(client, ex.Message, ex);
}
}
}
private async Task<ArraySegment<byte>> ReceiveRawMessage(ClientConnectionInfo client, WebSocket webSocket)
{
byte[] buffer = new byte[1024];
int usedBufferCount = 0;
while (true)
{
var segment = new ArraySegment<byte>(buffer, usedBufferCount, buffer.Length - usedBufferCount);
var result = await webSocket.ReceiveAsync(segment, client.AbortReceiveCts.Token);
usedBufferCount += result.Count;
if (result.CloseStatus != null)
{
return new ArraySegment<byte>();
}
if (result.EndOfMessage)
{
break;
}
if (usedBufferCount > buffer.Length - 128)
{
Array.Resize(ref buffer, buffer.Length * 2);
}
}
return new ArraySegment<byte>(buffer, 0, usedBufferCount);
} |
Beta Was this translation helpful? Give feedback.
-
I have decreased the log level from Information to Debug and see some additional entries now: (see the code above for where they're written)
So the API controller sees the connection as closed and the call to |
Beta Was this translation helpful? Give feedback.
-
I discovered that I didn't deploy the new file of the NuGet package DLL (I'm copying that manually now because I constantly make changes to it and see their result). Now that I use the current build of that file, the situation has changed slightly: The process never aborts in a way that systemd would notice it. So it looks like the exit code is always 0. The messages like "Unhandled exception." are also gone completely. Maybe I missed an exception handler before that is now added. But it's still not all good. I've added much more logging to see when all public static void Main(string[] args)
{
Console.WriteLine("MyApp is starting...");
CreateHostBuilder(args).Build().Run();
Console.WriteLine("MyApp is shut down");
} Sometimes I see all the shutdown messages I expect, until the final "MyApp is shut down". And sometimes, it still stops midway. Then none of the So now the situation is: It doesn't fail always, only sometimes. And when it does, it fails absolutely silently, with no output or exit code. I'm not sure this is an improvement. A try/catch handler around the |
Beta Was this translation helpful? Give feedback.
-
For the original error: You have an async void in there that might be problematic: cancellationTokenRegistration = cancellationToken.Register(async () =>
{
Console.WriteLine("Log (Router.HandleClientConnection): cancellationToken triggered, closing client WebSocket for user " + client.AuthenticatedUserName);
try
{
await CloseAsync(client.WebSocket, client.SendMutex, WebSocketCloseStatus.EndpointUnavailable, "Shutting down");
}
catch (Exception ex)
{
Console.WriteLine("Log (Router.HandleClientConnection): " + ex);
}
Console.WriteLine("Log (Router.HandleClientConnection): cancellationToken triggered, client WebSocket for user " + client.AuthenticatedUserName + " closed");
}); Don't run any async code in these callbacks as nobody will be waiting on them. I would instead use the token to end the current async execution and run this logic in the normal place where you already have a call to CloseAsync.
So it seems like there's no crash and we're discussing a different kind of error? What's the reproduction for this one? |
Beta Was this translation helpful? Give feedback.
-
I've changed the async/await with
Do you mean I should pass the cancellation token into Meanwhile I've added other types of logging,
Still the same: Stop the application with systemd. I have now added the systemd hosting support but that doesn't change anything. |
Beta Was this translation helpful? Give feedback.
-
I have changed the WebSocket receive cancellation so that it only uses a cancellation token (linked with the one that was already there) instead of closing the socket from the registered handler. This has improved the exceptions (much less exceptions flying around). I'm not sure why I haven't done that earlier. But the process still doesn't always shut down properly. Most of the time, I see the log messages of when the WebSocket controller action has returned from the connection. Often most of the For testing, I've now disabled WebSocket support in the application by removing the following code from app.UseWebSockets(new WebSocketOptions
{
KeepAliveInterval = TimeSpan.FromSeconds(120)
}); Now there were no more clients connected through WebSocket. With this modification, I could still reproduce the premature exit. So I guess the problem is not from WebSocket, it's from something else. WebSocket just took time so that less of the other work was done. In fact the time seems to be restricted. I added this test code to the try
{
Console.WriteLine("MyApp is starting...");
CreateHostBuilder(args).Build().Run();
Console.WriteLine("MyApp is shut down");
for (int i = 1; i <= 20; i++)
{
Thread.Sleep(10);
Console.WriteLine("waited " + (i * 10) + " ms");
}
return 0;
}
catch (Exception ex)
{
Console.WriteLine("Unhandled exception in Main: " + ex);
return 1;
} The result at the first time was this:
It wanted to count further but wasn't let to. All other times none of the wait messages appeared, sometimes not even the shutdown and some of the service stop messages before that. I've also checked all async delegates in the entire project. Some of them were just unawaited but had their own try/catch in them. I've replaced them with a I'm most struggling with how the process can end with exit code 0 even if it clearly must print some more messages, at least the one in the |
Beta Was this translation helpful? Give feedback.
-
I think I've found the issue. The host's service provider gets disposed at an unknown time during the host's shutdown (I couldn't trace down that part in the code). That leads to all services being disposed, including the logger (so I don't see anything anymore) and the host lifetime (so the SIGTERM handler is released the the process exits with code 0). The problem is, whenever that happens (it's kind of unpredictable), the application isn't necessarily done with its shutdown, like all of the After replacing this in Main: CreateHostBuilder(args).Build().Run(); with something like this: (inspired by this) IHost host = CreateHostBuilder(args).Build();
try
{
host.Start();
Console.WriteLine("MyApp is started");
host.WaitForShutdown();
Console.WriteLine("MyApp is shut down");
// I can do here whatever I want, for however long I want (maybe seconds), it will run completely
}
finally
{
// Not sure if I need all this or just `Dispose` is enough
if (host is IAsyncDisposable asyncDisposable)
{
asyncDisposable.DisposeAsync().ConfigureAwait(false).GetAwaiter().GetResult();
}
else
{
host.Dispose();
}
} I see all the messages till the very end. Nothing is missing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
Our application is a client-server application where clients connect to the server through WebSocket. Those connections are held by the client during its entire runtime. When the server shuts down, those connections must be closed by the server. This is triggered by the
IHostApplicationLifetime.ApplicationStopping
cancellation token. While the server handles an open connection from a client in its controller action method, it has a registration on that token that then callsWebSocket.CloseAsync
.This has always worked fine in .NET Core 2.2 but fails miserably in .NET Core 3.1. After upgrading the server application, when it should shut down, it crashes instead. Different output is generated on the console each time and the output is not reproducible. But the crash happens every time, the process never ends normally. All code paths have multiple try/catch handlers around them and all async calls are awaited but nothing helps. This web socket throws an uncatchable exception. Sometimes, there is no console output at all (before systemd says the process has aborted), sometimes I see slightly chaotic output like this:
It looks like the "fail:" starts another last log message and the repeated "Unhandled exception." before have no line termination. And sometimes it writes out a bit more before it dies:
I'm not sure this is all it wanted to write, it probably died midway.
In rare cases, the exception(s?) made it further into my catch blocks and was logged partially. It seems that the process crashed while it was still writing its log lines to the console (where they are picked up by systemd), so the log is unusable in this case because it's not reliable. I have now added more direct
Console.WriteLine
calls but most of the time, nothing is printed at all. The process just disappears. That's not why I'm using .NET – I'm used to get reliable exception traces instead of the typical silent crashes of an unmanaged C application.I believe that the unhandled exception is thrown in code that's out of my scope. It must be somewhere in the runtime/framework classes and there's a try/catch handler missing somewhere. And it was introduced between .NET Core 2.2 and 3.1. And it's kind of a critical issue because we can't cleanly shut down our server application anymore now, leading to data loss in unpredictable forms.
To Reproduce
Sorry, nothing to share here. The application is too big (and non-public) and I don't know how I should create a copy of it that still shows the bug. There's also an additional NuGet package of mine involved in handling those connections. It would require a large effort in the range of days to set this up.
I could try to analyse this further if I knew where to start. I've tried all I could think of, the proces just dies when it really shouldn't. That's beyond my capabilities now.
Exceptions (if any)
Certainly, but nothing really much to see of it. See above.
Further technical details
Beta Was this translation helpful? Give feedback.
All reactions