feat: add extensible observability via metrics, tracing, and logging by appleboy · Pull Request #11 · appleboy/go-httpretry

appleboy · 2026-02-15T03:04:18Z

Add zero-overhead observability interfaces for metrics, tracing, and structured logging
Implement functional options to configure custom metrics collectors, tracers, and loggers
Integrate metrics, logging, and tracing into the entire HTTP retry request lifecycle
Provide default no-op implementations to eliminate nil checks when observability is disabled
Add Slog (structured logging) adapter and supporting tests
Add example implementations for Prometheus, OpenTelemetry, and Slog in the examples directory
Add comprehensive documentation for observability features and usage patterns
Add thorough test coverage for metrics, tracing, and logging integration
Improve and clarify README to highlight observability capabilities and add new example links

- Add zero-overhead observability interfaces for metrics, tracing, and structured logging - Implement functional options to configure custom metrics collectors, tracers, and loggers - Integrate metrics, logging, and tracing into the entire HTTP retry request lifecycle - Provide default no-op implementations to eliminate nil checks when observability is disabled - Add Slog (structured logging) adapter and supporting tests - Add example implementations for Prometheus, OpenTelemetry, and Slog in the examples directory - Add comprehensive documentation for observability features and usage patterns - Add thorough test coverage for metrics, tracing, and logging integration - Improve and clarify README to highlight observability capabilities and add new example links Signed-off-by: appleboy <appleboy.tw@gmail.com>

Copilot

Pull request overview

Adds an extensible observability layer to the retry HTTP client via dependency-free interfaces (metrics, tracing, structured logging), wiring these signals into the request/attempt/retry lifecycle with default no-op implementations and accompanying docs/examples/tests.

Changes:

Introduce MetricsCollector, Tracer/Span, and Logger interfaces with default no-op implementations.
Integrate metrics/tracing/logging into Client.DoWithContext and expose configuration via WithMetrics, WithTracer, WithLogger.
Add documentation, examples, and new tests covering observability behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`retry.go`	Adds observability fields to `Client`, initializes no-op defaults, and emits metrics/spans/logs across the retry lifecycle.
`option.go`	Adds functional options to configure metrics/tracer/logger implementations.
`metric.go`	Defines metrics interface + no-op collector and a retry-reason classifier.
`tracer.go`	Defines tracing interfaces + no-op tracer/span.
`logger.go`	Defines logging interface + no-op logger.
`logger_slog.go`	Adds `log/slog` adapter to satisfy the new `Logger` interface.
`metric_test.go`	Adds tests for metrics integration and retry-reason classification.
`tracer_test.go`	Adds tests validating span creation/end/status and retry events.
`logger_test.go`	Adds tests validating log emission during retries.
`observability_test.go`	Adds end-to-end tests for combined observability, failure path, and default no-op behavior.
`docs/OBSERVABILITY.md`	Adds comprehensive observability guide and integration patterns.
`_example/observability/slog/main.go`	Runnable example showing slog logging integration.
`_example/observability/prometheus/main.go`	Runnable example showing Prometheus-style metrics integration pattern.
`_example/observability/opentelemetry/main.go`	Runnable example showing OpenTelemetry-style tracing integration pattern.
`_example/observability/combined/main.go`	Runnable example combining metrics + tracing + logging.
`README.md`	Updates feature list and links to observability docs/examples.

Comments suppressed due to low confidence (1)

retry.go:299

When ctx.Done() triggers while waiting before a retry, the function returns a RetryError immediately but does not emit any observability signals (no RecordRequestComplete, no request span status update, no error log). This leaves metrics/traces/logs incomplete for canceled/timeouts during backoff. Consider recording a final completion metric (success=false), setting the request span status to error with ctx.Err(), and logging the cancellation before returning.

			// Wait before retry
			select {
			case <-ctx.Done():
				// Context cancelled - return RetryError with information about the attempts made
				statusCode := 0
				if resp != nil {
					statusCode = resp.StatusCode
				}
				return nil, &RetryError{
					Attempts:   attempt,
					LastErr:    ctx.Err(),
					LastStatus: statusCode,
					Elapsed:    time.Since(startTime),
				}
			case <-time.After(actualDelay):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T03:11:52Z

retry.go

+			// Success - record final metrics (no nil check needed)
+			c.metrics.RecordRequestComplete(
+				req.Method,
+				statusCode,
+				time.Since(startTime),
+				attempt+1,
+				true,
+			)
+			c.logger.Debug("request completed",
+				"method", req.Method,
+				"attempts", attempt+1,
+				"duration", time.Since(startTime),
+			)
+			requestSpan.SetStatus("ok", "")
+


In the !c.retryableChecker(lastErr, resp) early-return branch, the code records the request as success=true and sets the request span status to ok unconditionally. This branch also covers non-retryable failures (e.g., transport error deemed non-retryable, or non-retryable HTTP status), so observability will incorrectly report failures as successes. Consider deriving success from the actual outcome (e.g., lastErr == nil and/or status code), setting requestSpan status accordingly, and using an appropriate log level/message for non-retryable errors.

Suggested change

// Success - record final metrics (no nil check needed)

c.metrics.RecordRequestComplete(

req.Method,

statusCode,

time.Since(startTime),

attempt+1,

true,

)

c.logger.Debug("request completed",

"method", req.Method,

"attempts", attempt+1,

"duration", time.Since(startTime),

)

requestSpan.SetStatus("ok", "")

// Determine whether this final outcome is a success

success := lastErr == nil && statusCode >= 200 && statusCode < 300

// Record final metrics (no nil check needed)

c.metrics.RecordRequestComplete(

req.Method,

statusCode,

time.Since(startTime),

attempt+1,

success,

)

if success {

c.logger.Debug("request completed",

"method", req.Method,

"attempts", attempt+1,

"duration", time.Since(startTime),

)

requestSpan.SetStatus("ok", "")

} else {

// Non-retryable failure

if lastErr != nil {

c.logger.Debug("request completed with non-retryable error",

"method", req.Method,

"attempts", attempt+1,

"duration", time.Since(startTime),

"error", lastErr,

)

requestSpan.SetStatus("error", lastErr.Error())

} else {

c.logger.Debug("request completed with non-retryable HTTP status",

"method", req.Method,

"attempts", attempt+1,

"duration", time.Since(startTime),

"statusCode", statusCode,

)

requestSpan.SetStatus("error", fmt.Sprintf("non-retryable HTTP status %d", statusCode))

}

}

retry.go

option.go

Copilot · 2026-02-15T03:11:53Z

metric_test.go

+	// Use invalid URL to trigger network error
+	req, _ := http.NewRequest(http.MethodGet, "http://invalid-domain-that-does-not-exist.com", nil)
+	_, err = client.Do(req)
+	if err == nil {
+		t.Fatal("Expected network error")
+	}


This test triggers a “network error” by requesting http://invalid-domain-that-does-not-exist.com, which depends on DNS/network availability and can be flaky in CI environments (offline runners, DNS interception, proxies). Prefer a deterministic local failure (e.g., connect to a closed local listener/unused localhost port) so the test reliably produces a connection error without external dependencies.

Copilot · 2026-02-15T03:11:53Z

retry.go

+	// Start outer span for entire retry operation (no nil check needed)
+	ctx, requestSpan := c.tracer.StartSpan(ctx, "http.retry.request",
+		Attribute{Key: "http.method", Value: req.Method},
+		Attribute{Key: "http.url", Value: req.URL.String()},
+		Attribute{Key: "retry.max_attempts", Value: c.maxRetries + 1},
+	)
+	defer requestSpan.End()
+
+	// Log request start (no nil check needed)
+	c.logger.Debug("starting request",
+		"method", req.Method,
+		"url", req.URL.String(),
+		"max_retries", c.maxRetries,
+	)


Even with the default no-op implementations, DoWithContext now always builds tracing attributes and logging fields (e.g., req.URL.String() is computed twice) and performs interface calls on every request. This introduces measurable overhead when observability is “disabled by default”, contradicting the zero/near-zero overhead goal. Consider adding a fast-path that skips all observability work when metrics/tracer/logger are the default no-ops (or introducing an Enabled()/level check), so expensive field construction is avoided unless a real implementation is configured.

- Add extensive tests to verify retry delay logging, callback accuracy, Retry-After header handling, and tracing delay correctness - Refactor retry delay calculation logic into separate helper methods - Improve separation of delay computation, delay modifiers, and retry callback invocation - Update tests to assert precise expected delay values for retries - Make retry delays predictable by disabling jitter in unit tests ref: #11 (comment) Signed-off-by: appleboy <appleboy.tw@gmail.com>

Copilot AI review requested due to automatic review settings February 15, 2026 03:04

Copilot started reviewing on behalf of appleboy February 15, 2026 03:04 View session

appleboy merged commit 0c24722 into main Feb 15, 2026
20 of 24 checks passed

Copilot AI reviewed Feb 15, 2026

View reviewed changes

appleboy mentioned this pull request Feb 15, 2026

test: refactor and test retry delay logic for improved accuracy #14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add extensible observability via metrics, tracing, and logging#11

feat: add extensible observability via metrics, tracing, and logging#11
appleboy merged 1 commit intomainfrom
observability

appleboy commented Feb 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Copilot AI Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

appleboy commented Feb 15, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants