Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envelopes that fail to send are now flushed when the transport recovers #3438

Merged
merged 25 commits into from
Jul 11, 2024

Conversation

jamescrosswell
Copy link
Collaborator

@jamescrosswell jamescrosswell commented Jun 20, 2024

Resolves #3384

Analysis

Previously, when envelopes failed to send (e.g. due to network availability) these were left in a __processing directory. The only way to get them back into the cache directory was to reinitialise the SentrySDK (at which point the SDK checks to see if there are any envelopes left in __processing from a previous execution that might have terminated unexpectedly, without being able to flush these out properly.

Approach / Solution

We'll detect when the network goes offline the first time we try to send an envelope. We now have to implementations of INetworkStatusListener:

  1. MauiNetworkStatusListener
  2. PollingNetworkStatusListener

If any errors occur when sending envelopes, regardless of whether we're in the context of a MAUI application or not, one of those network listeners is now used to detect when the network comes back online and to send any pending envelopes.

The following little console app is useful to test all of this (while monitoring the files in the cache directory in the Finder or Explorer):

using System.Diagnostics;
using Sentry.Infrastructure;

SentrySdk.Init(options =>
{
    // options.Dsn = "... Your DSN ...";
    options.DiagnosticLogger = new DebugDiagnosticLogger(SentryLevel.Debug);
    options.CacheDirectoryPath = "cache";
    options.Debug = true;
    options.IsGlobalModeEnabled = true;
});

var i = 0;
ConsoleKeyInfo input;
do
{
    SentrySdk.ConfigureScope(scope => scope.SetExtra("Loop Count", ++i));
    SentrySdk.CaptureMessage("Hello, Sentry!");

    Console.WriteLine("Press y to continue or any other key to quit...");
    input = Console.ReadKey();
    Console.WriteLine();
} while (input.Key == ConsoleKey.Y);

Console.WriteLine("Goodbye!");

public class DebugDiagnosticLogger : DiagnosticLogger
{
    public DebugDiagnosticLogger(SentryLevel minimalLevel) : base(minimalLevel) {}
    protected override void LogMessage(string message) => Debug.WriteLine(message);
}

@jamescrosswell jamescrosswell changed the title Processing envelopes now get flushed after reconnecting to the network Envelopes that fail to send are now flushed after reconnecting to the network Jun 20, 2024
@jamescrosswell jamescrosswell changed the title Envelopes that fail to send are now flushed after reconnecting to the network Envelopes that fail to send are now flushed when the transport recovers Jun 20, 2024
src/Sentry/Internal/Http/CachingTransport.cs Outdated Show resolved Hide resolved
src/Sentry/Internal/Http/CachingTransport.cs Outdated Show resolved Hide resolved
src/Sentry/Internal/Http/CachingTransport.cs Outdated Show resolved Hide resolved
@jamescrosswell
Copy link
Collaborator Author

On reflection the most recent changes we introduced might be problematic. For non-MAUI applications, we now have this code running using a PollingNetworkStatusListener:

if (_options.NetworkStatusListener is { Online: false } listener)
{
_options.LogDebug("The network is offline. Pausing processing.");
await listener.WaitForNetworkOnlineAsync(cancellation).ConfigureAwait(false);
_options.LogDebug("The network is back online. Resuming processing.");
}

That will wait, potentially indefinitely, until our PollingNetworkStatusListener detects that the network is online again (i.e. until it can successfully ping the Sentry host). Some particularly paranoid network administrators might disable ICMP on their network routers... so there's no guarantee our ping check will work on all networks. In that scenario, as soon as the network is unavailable, our transport will stop sending envelopes and won't start sending these again until the app restarts.

Maybe instead of using ICMP/Ping, we should simply try to establish an HTTP or HTTPS connection with Sentry then... I'll muck around and see what the easiest way is to do that from .NET.

@jamescrosswell
Copy link
Collaborator Author

Maybe instead of using ICMP/Ping, we should simply try to establish an HTTP or HTTPS connection with Sentry then... I'll muck around and see what the easiest way is to do that from .NET.

OK, turned out to be quite trivial... plain old TCP connect. Can't get much more reliable than that!

{
try
{
using var tcpClient = new TcpClient();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the new TcpClient every 500ms is not an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be every 500ms due to the progressive backoff. There would be a first attempt, a subsequent attempts after 500ms, then 1000ms then 2000ms etc.

We could try to use the same TcpClient but it would need to be reset to be usable again after each unsuccessful connection attempt. I could do some benchmarks to check those two different approaches. It's worth noting that there will only be one PollingNetworkStatusListener instance in the application though, so I don't think this is really a hot path.

src/Sentry/Internal/Http/CachingTransport.cs Show resolved Hide resolved
@jamescrosswell jamescrosswell merged commit d5d4b66 into main Jul 11, 2024
22 checks passed
@jamescrosswell jamescrosswell deleted the stuck-envelopes-3384 branch July 11, 2024 02:32
set => _online = value;
}

public async Task WaitForNetworkOnlineAsync(CancellationToken cancellationToken = default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument has a default value but I don't see it being tested when a cancellation token is passed.

If one is passed, and triggered, Delay would throw an exception that isnt' handled anywhere here. Is that considered?

I'd recommend remove the argument if we're not using anywhere (at least I can't see it being used).

If we're using it, add a test that does that to verify the behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Added test and modified code to ensure a graceful exit if the task is cancelled:

@@ -1297,6 +1297,10 @@ public SentryOptions()
UriComponents.SchemeAndServer,
UriFormat.Unescaped)
);

#if PLATFORM_NEUTRAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean in practice? Did we document in what environments this will work and which ones it wont?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the flip side to this:

#if !PLATFORM_NEUTRAL
// We can use MAUI's network connectivity information to inform the CachingTransport when we're offline.
options.NetworkStatusListener = new MauiNetworkStatusListener(Connectivity.Current, options);
#endif

With PLATFORM_NEUTRAL being defined here.

In practice, I think that means if you lose network connectivity and then regain it again, and you're targeting one of those platforms specifically but you're not using MAUI, then any envelopes that you tried to send while the network was down won't be sent until you restart the application again.

Another option would be to always initialise the SentryOptions.NetworkStatusListener to a new PollingNetworkStatusListener but let the SentryMauiOptionsSetup override this... or some variant of that. Technically SentryOptions.NetworkStatusListener is public so the user could set this to some custom listener that they've crafted. We don't currently do anything to make sure we don't override the user's listener if they've already set one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have created #3510 to track this conversation in any case (this PR is closed/merged so easy for the conversation to get lost here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently do anything to make sure we don't override the user's listener if they've already set one.

I think that's the crux. But if we set it up in the option's constructor the user would overwrite it again and we'd be fine, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envelopes stuck in __processing until restart
4 participants