-
Notifications
You must be signed in to change notification settings - Fork 858
Description
Bug Report
Symptom
Describe the bug
After deployment, telemetry logging in our .NET 6 projects using OpenTelemetry only functions for a limited time span. The logging ceases without error warnings, and a memory dump reveals that instances of OpenTelemetry vanish until a connection pool refresh.
Expected behavior
Continuous and uninterrupted telemetry logging with OpenTelemetry instances remaining persistent without requiring a pool refresh.
Runtime environment:
- OpenTelemetry version: 1.6.0
- OpenTelemetr.Exporter.OpenTelemetryProtocol version 1.6.0
- OS: Windows
- Application hosting enviroment ( also collector ) : AWS
- .NET version: .NET Framework 6.0 + Language C#
Additional context
Potential issue with the Garbage Collector prematurely deallocating the MeterProvider instance, interrupting the telemetry logging. Deployed applications utilizing .NET 6 with slightly varied OpenTelemetry initiation codes are being impacted.
Notably, within our suite of applications, the ones experiencing this issue are the only two hosted on IIS, presenting a possible correlation that may warrant further exploration.
Reproduce
Steps to reproduce the behavior:
- Deploy a .NET 6 application utilizing OpenTelemetry for telemetry logging.
- After a short time span, observe ceasing of telemetry logging without error messages.
- Perform a memory dump and note the absence of OpenTelemetry instances.
- Refresh the connection pool, note the reappearance of OpenTelemetry instances, and resume of telemetry logging.
Additional Steps Taken:
- Implemented a static field, Provider, to retain the MeterProvider instance
- Utilized GC.KeepAlive to prevent early collection by the GC
private static MeterProvider Provider;
// Configuration logic... I store MeterProvider in th static variable
Provider = Sdk.CreateMeterProviderBuilder()
// Additional configuration logic...
.Build();
I added GC KeepAlive
// Further configuration logic...
GC.KeepAlive(Provider);
that replaces the previous logic:
services.AddSingleton(meterProvider);
The solution seemed to work, but in the long run, an oscillating performance occurred so we haven't solved the issue.
Configuration class code
Here I provide you the code of the configuration class before applying changes described above :
using OpenTelemetry;
using OpenTelemetry.Exporter;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;
namespace MyProject.Metrics;
public static class MetricsConfigurator
{
private static readonly AsyncLocal<MetricStreamConfiguration> CurrentCallConfiguration = new();
public static MeterProvider Configure(IConfiguration configuration, Assembly mainAssembly)
{
var appName = mainAssembly.GetName();
var resource = ResourceBuilder.CreateDefault()
.AddService(serviceName: appName.Name, serviceVersion: appName.Version!.ToString())
.AddAttributes(new KeyValuePair<string, object>[]
{
new("server.name", Environment.MachineName),
new("process.id", Environment.ProcessId)
});
var exportInterval = TimeSpan.FromMilliseconds(configuration.GetValue("OpenTelemetry:ExportIntervalMilliseconds", 60000));
return Sdk.CreateMeterProviderBuilder()
.AddMeter("*")
.SetResourceBuilder(resource)
.AddView(instrument =>
{
var config = CurrentCallConfiguration.Value;
CurrentCallConfiguration.Value = null;
return config;
})
.AddOtlpExporter((eo, mo) =>
{
eo.Endpoint = new Uri("http://localhost:2222");
eo.Protocol = OtlpExportProtocol.Grpc;
mo.PeriodicExportingMetricReaderOptions = new PeriodicExportingMetricReaderOptions
{
ExportIntervalMilliseconds = (int)exportInterval.TotalMilliseconds
};
mo.TemporalityPreference = MetricReaderTemporalityPreference.Delta;
}).Build();
}
public static void AddOpenTelemetry(this IServiceCollection services, IConfiguration configuration)
{
var meterProvider = Configure(configuration, Assembly.GetCallingAssembly());
services.AddSingleton(meterProvider);
}
public static Meter UsingConfiguration(this Meter meter, MetricStreamConfiguration configuration)
{
CurrentCallConfiguration.Value = configuration;
return meter;
}
}
The class is referenced in Program.cs written using Minimal Hosting / Minimal Apis approach :
// ......
builder.Services.AddOpenTelemetry(builder.Configuration);
// ......
Log Results:
Enabling logs the following error comes out, but I'm pretty sure that we haven't network issue :
Clear Indication Towards an IIS Context Issue
After traversing through the data and instances provided, we've descended into a scenario that requires our utmost attention and collaborative resolution. I’ve dissected the situation and here is a clear, deductive path that leans towards the conclusions reached in point 3.
-
Persistent Memory Instances:
After manually invoking the Garbage Collector during the memory dump, both instances - the metrics-producing application and the collector - remain discernible in the memory. This consistency between active instances and memory representation points towards an undeviating relation of instances with our issue at hand, rather than an irregularity in memory allocation or deallocation. -
Network Error Elimination:
The co-existence of the metrics producer and collector on the identical machine effectively nullifies the hypothesis of network errors contributing to the traced issue. This brings to the forefront a stark revelation that the surface-level error we are perceiving is perhaps a symptom of an underlying, concealed issue. We also try to replace localhost:2222 with 127.0.0.1:2222 to force IPv4 without positive results. -
The IIS Context Culprit:
When immersing into a .NET 6 console application context, the metrics are dispatched seamlessly, devoid of discrepancies. Conversely, introducing an application into the IIS context, by launching it as a process (in-process), resurrects the aforestated issue. This correlation between the manifestation of the problem and the IIS context is undeniable.
Conclusion
Considering these aspects and the additional information that the problem is being observed only on applications hosted on AWS using Framework 6.0 - as opposed to those on Azure using Framework 4.8 which are running smoothly - it's evident that our issue is intricately tied to the IIS context in which the application is running, and potentially the interaction with the specific environment and framework version.
Your expertise and insights into how applications behave diversely when executed within an IIS context, or any anomalies witnessed previously, especially within an AWS environment and utilizing Framework 6.0, would be immensely valuable. Urgency binds us due to the criticality of the situation, and your swift response will be greatly appreciated.
Let's unravel this together and sculpt a resolution at the earliest. Observations on peculiarities or necessary settings, either at the environment or application level for this specific hosting scenario, would be instrumental in navigating towards a solution.
Thank you in advance for your cooperation and looking forward to fruitful discussions ahead.