Skip to content

Conversation

@smocherla-brex
Copy link

@smocherla-brex smocherla-brex commented Nov 28, 2025

Addresses #684

This PR attempts to add initial support for Opentelemetry using https://github.com/open-telemetry/opentelemetry-go:

  1. Aim for support for tracing with OTEL using grpc exporter, disabled by default
  2. Only traces for now to keep the scope of this PR limited. No custom spans but ones built-in that come with the SDK.
  3. WIll add spans for http calls with http cache, grpc calls with grpc implementation, and for all grpc methods with server interceptors
  4. Will also add spans for http calls for the proxies by wrapping minio's http transport with otlp's transport.

We don't have spans for filesystem calls - although we can add them. But I don't think it's needed for that disk/IO metrics can usually help in that situation.

Testing: Some unit tests, I'll share some details once I am able to verify end-to-end after building it here and testing with otel-collector. Below is an example of a trace I generated by running it locally.

for bystream fetch
image

for remote asset api
image

@smocherla-brex smocherla-brex marked this pull request as ready for review December 1, 2025 15:43
Comment on lines +107 to +109
if remote.Transport == nil {
remote.Transport = http.DefaultTransport
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like NewTransport does this internally, so we can skip it.

}

if c.Otel.Tracing.SampleRate == 0 {
c.Otel.Tracing.SampleRate = 1.0 // Default to 100% sampling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this default risky?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typically a user from my experience would expect to get all the traces that're being emitted, now if they have too much data, they can either filter the trace spans out in the otel-collector or override this config.

c.Otel.Tracing.SampleRate,
)
if err != nil {
log.Printf("WARNING: Failed to initialize OpenTelemetry: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: please use Warning instead of WARNING


// Initialize OpenTelemetry if enabled
var otelProvider *otel.TracerProvider
if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth setting this to an otelTracingEnabled variable and check that instead, since we do it multiple times.

Comment on lines +333 to +339
// Wrap status handler with OTEL tracing if enabled
if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled {
statusHandler = otelhttp.NewHandler(
http.HandlerFunc(statusHandler),
"status",
).ServeHTTP
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how useful it would be to trace the status page?

Comment on lines +29 to +30
// TLS is usually not typically required if running in a k8s environment
otlptracegrpc.WithInsecure(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a configuration setting to avoid this.

Name: "otel.tracing.exporter_endpoint",
Usage: "The OTLP exporter endpoint for OpenTelemetry traces. Can also be set via OTEL_EXPORTER_OTLP_ENDPOINT env var.",
DefaultText: "empty, will use OTEL_EXPORTER_OTLP_ENDPOINT env var",
EnvVars: []string{"BAZEL_REMOTE_OTEL_TRACING_EXPORTER_ENDPOINT"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code is checking these environment variables with the BAZEL_REMOTE_ prefix. I'm not sure if it's better to use standard variables, or to use custom ones.

// Add OTEL interceptors first for complete trace coverage
if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled {
streamInterceptors = append(streamInterceptors,
otelgrpc.StreamServerInterceptor())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These deprecated functions should be avoided.

// Wrap transport with OTEL instrumentation if enabled
var transport http.RoundTripper = tr
if otelEnabled {
transport = otelhttp.NewTransport(tr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not tr = otelhttp.NewTransport(tr) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants