Skip to content

wol-proxy: reduce stale-ready 502s and add optional source-ip for WoL#592

Open
gabrielcosi wants to merge 6 commits into
mostlygeek:mainfrom
gabrielcosi:main
Open

wol-proxy: reduce stale-ready 502s and add optional source-ip for WoL#592
gabrielcosi wants to merge 6 commits into
mostlygeek:mainfrom
gabrielcosi:main

Conversation

@gabrielcosi

Copy link
Copy Markdown

Summary

This improves wol-proxy behavior for suspended upstream hosts and multi-network environments.

Issues fixed

I run wol-proxy in Kubernetes to wake a suspended llama-swap host on demand. Two issues came up:

  • Stale ready state after suspend: when the upstream host suspended, wol-proxy stayed in "ready" state until the SSE connection eventually timed out. Requests arriving in that window hit the reverse proxy
    directly and got 502s instead of triggering a wake.
  • WoL sent on wrong network: magic packets always used default routing, which doesn't work in multi-interface/Multus setups where WoL must egress a specific NIC to reach the target host.

@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds two new CLI flags (--upstream-api-key, --source-ip), enhances reverse-proxy upstream auth and SSE readiness lifecycle, buffers/retries proxied requests, binds WoL packets to a source IP, adds tests, updates README example, and adds/removes CI workflow and Containerfile.

Changes

Cohort / File(s) Summary
Documentation
cmd/wol-proxy/README.md
Updated example to document -upstream-api-key and -source-ip; adjusted multi-line example formatting.
Proxy implementation
cmd/wol-proxy/wol-proxy.go
Added --upstream-api-key and --source-ip flags; validate IPv4 source-ip; extended proxyServer state and SSE cancel handling; injects Authorization: Bearer/x-api-key into upstream requests; custom http.Transport; ErrorHandler marks upstream not-ready, cancels SSE, increments failures, sends WoL bound to optional source-ip, and serves loading page for UI paths; SSE loop uses cancellable contexts and stored cancel func; request handling buffers up to 10MB bodies, retries proxied requests via proxyResponseWriter; added isClientDisconnect, SSE cancel helpers, and updated sendMagicPacket to accept sourceIP.
Tests
cmd/wol-proxy/wol-proxy_test.go
New tests covering sendMagicPacket validation, isClientDisconnect heuristics, error-handler readiness/failure transitions, loading page and /status responses, not-ready routing/timeouts, SSE recovery flow, and upstream API-key header injection for SSE and proxied requests.
CI workflow (added)
.github/workflows/llama-proxy-package.yml
New workflow to build and publish multi-arch Docker images for docker/llama-proxy.Containerfile, tagging :main and :sha-<shortSHA>.
Container build
docker/llama-proxy.Containerfile
Added multi-stage Containerfile to build cmd/wol-proxy and produce a minimal Alpine runtime image exposing port 8080.
CI workflows (removed)
.github/workflows/closeinactive.yml, .github/workflows/containers.yml
Removed daily issue-stale/close workflow and previous container build/push workflow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • mostlygeek
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: reducing stale-ready 502s and adding optional source-IP for WoL, matching the core improvements in the changeset.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the two issues addressed (stale ready state and multi-network WoL) and the context for the improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@SebastianSilvaGh

Copy link
Copy Markdown

+1

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
cmd/wol-proxy/wol-proxy_test.go (1)

103-116: Exercise the real ErrorHandler instead of copying it into the test.

These closures duplicate production logic, so the tests can stay green while newProxy(...).upstreamProxy.ErrorHandler drifts on cancelSSE(), WoL dispatch, or response-writing behavior. Pull the handler body into a shared method/helper and invoke that same code path here.

Also applies to: 141-154, 180-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy_test.go` around lines 103 - 116, The test currently
duplicates the production error handling closure; move the shared logic into the
real ErrorHandler used by the proxy and call that in tests instead of
re-implementing it. Extract the closure body into the proxy's ErrorHandler (or a
small helper invoked by ErrorHandler) so behaviors like cancelSSE(), WoL
dispatch, response-writing and calls to p.setStatus/notready, p.incFail and
writing loadingPageHTML are centralized, then update the test to invoke
newProxy(...).upstreamProxy.ErrorHandler(w, r, err) (or the shared helper) at
the three locations (lines ~103-116, ~141-154, ~180-186) so the test exercises
the actual production handler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/wol-proxy/README.md`:
- Around line 22-24: The multiline example uses a trailing backslash after
"-timeout 30 \" which escapes the newline and causes the following commented
line "# bind WoL packets..." to become part of the continued command; update the
README example so comments are outside the continued command or provide a
comment-free multi-line invocation — e.g., remove the trailing backslash after
"-timeout 30" (or move the "# bind WoL packets..." line above/below the
continued command) and ensure flags such as "-timeout 30" and "-source-ip
192.168.1.5" are shown on separate lines of the same continued command
correctly.

In `@cmd/wol-proxy/wol-proxy.go`:
- Around line 364-381: Replace the use of context.Background() with the
request's context and avoid blind replay of mutating requests: create the
timeout from r.Context() (e.g., ctx, cancel := context.WithTimeout(r.Context(),
time.Duration(*flagTimeout)*time.Second)) and use ctx.Done() in the select so we
stop if the client disconnects; before resubmitting the request via
p.upstreamProxy.ServeHTTP(w, r), only retry when r.Method is safe/idempotent
(allow GET, HEAD, OPTIONS, TRACE) or for otherwise idempotent methods
(PUT/DELETE/PATCH) only when the request supplies an explicit idempotency
guarantee (e.g., presence of an "Idempotency-Key" header); preserve resetting
r.Body from bodyBytes as before and ensure ticker.Stop()/cancel() are called on
all paths.
- Around line 343-353: Replace the silent truncation by detecting overflow: read
up to maxBodySize+1 bytes (e.g., use io.LimitReader with maxBodySize+1 or read
into a buffer and check len) instead of maxBodySize; if the returned bodyBytes
length is greater than maxBodySize, respond with http.Error(w, "request body too
large", http.StatusRequestEntityTooLarge) and return; otherwise set r.Body =
io.NopCloser(bytes.NewReader(bodyBytes)) and update r.ContentLength =
int64(len(bodyBytes)) so downstream code/transport sees the correct length.
Ensure you reference the existing maxBodySize, bodyBytes, r.Body and the http
error status constants (StatusRequestEntityTooLarge).
- Around line 269-286: The code calls cancel() before checking why the SSE loop
stopped, so ctx.Err() is always context.Canceled and EOF/scanner errors get
misclassified; remove or move the cancel() call so classification happens first:
capture scanner.Err() into scanErr, close resp.Body, then inspect scanErr and
ctx.Err() to decide the branch (call proxy.setStatus(notready) and
proxy.incFail(1) on non-cancel errors), and only call cancel() after you’ve
classified and acted on the error (or don’t call cancel() at this location at
all). Reference: cancel(), ctx.Err(), scanner.Err(), resp.Body.Close(),
proxy.setStatus(notready), proxy.incFail.

---

Nitpick comments:
In `@cmd/wol-proxy/wol-proxy_test.go`:
- Around line 103-116: The test currently duplicates the production error
handling closure; move the shared logic into the real ErrorHandler used by the
proxy and call that in tests instead of re-implementing it. Extract the closure
body into the proxy's ErrorHandler (or a small helper invoked by ErrorHandler)
so behaviors like cancelSSE(), WoL dispatch, response-writing and calls to
p.setStatus/notready, p.incFail and writing loadingPageHTML are centralized,
then update the test to invoke newProxy(...).upstreamProxy.ErrorHandler(w, r,
err) (or the shared helper) at the three locations (lines ~103-116, ~141-154,
~180-186) so the test exercises the actual production handler.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 65e0db6c-26b4-4f5a-940a-8d6c06edfa91

📥 Commits

Reviewing files that changed from the base of the PR and between 15bd55d and f17905c.

📒 Files selected for processing (3)
  • cmd/wol-proxy/README.md
  • cmd/wol-proxy/wol-proxy.go
  • cmd/wol-proxy/wol-proxy_test.go

Comment thread cmd/wol-proxy/README.md
Comment on lines +22 to +24
-timeout 30 \
# bind WoL packets to a specific network interface
-source-ip 192.168.1.5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This multiline example still won't execute as pasted.

-timeout 30 \ escapes the newline into the following # ... line, so the shell comments out the rest of the logical command and later flags run as separate commands. Move the comments outside the continued command or show a comment-free multi-line example.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/README.md` around lines 22 - 24, The multiline example uses a
trailing backslash after "-timeout 30 \" which escapes the newline and causes
the following commented line "# bind WoL packets..." to become part of the
continued command; update the README example so comments are outside the
continued command or provide a comment-free multi-line invocation — e.g., remove
the trailing backslash after "-timeout 30" (or move the "# bind WoL packets..."
line above/below the continued command) and ensure flags such as "-timeout 30"
and "-source-ip 192.168.1.5" are shown on separate lines of the same continued
command correctly.

Comment on lines +269 to +286
cancel()

scanErr := scanner.Err()
if scanErr != nil && ctx.Err() != context.Canceled {
slog.Error("error reading from SSE stream", "error", scanErr)
}

// Connection closed or error occurred
_ = resp.Body.Close()
slog.Info("SSE connection closed, upstream not ready")
proxy.setStatus(notready)
proxy.incFail(1)

// Wait before reconnecting
time.Sleep(waitDuration)
if ctx.Err() == context.Canceled {
slog.Info("SSE connection canceled, reconnecting")
time.Sleep(2 * time.Second)
} else {
slog.Info("SSE connection closed, upstream not ready")
proxy.setStatus(notready)
proxy.incFail(1)
time.Sleep(waitDuration)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't cancel the SSE context before classifying why the stream stopped.

Line 269 forces ctx.Err() to context.Canceled, so EOF and scanner failures both take the fast-reconnect branch and this loop never marks the upstream notready here. That leaves a stale-ready window open until the next reconnect attempt or request failure.

Suggested fix
-			cancel()
-
-			scanErr := scanner.Err()
-			if scanErr != nil && ctx.Err() != context.Canceled {
+			canceled := errors.Is(ctx.Err(), context.Canceled)
+			scanErr := scanner.Err()
+			if scanErr != nil && !canceled {
 				slog.Error("error reading from SSE stream", "error", scanErr)
 			}
 
 			_ = resp.Body.Close()
+			cancel()
 
-			if ctx.Err() == context.Canceled {
+			if canceled {
 				slog.Info("SSE connection canceled, reconnecting")
 				time.Sleep(2 * time.Second)
 			} else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 269 - 286, The code calls cancel()
before checking why the SSE loop stopped, so ctx.Err() is always
context.Canceled and EOF/scanner errors get misclassified; remove or move the
cancel() call so classification happens first: capture scanner.Err() into
scanErr, close resp.Body, then inspect scanErr and ctx.Err() to decide the
branch (call proxy.setStatus(notready) and proxy.incFail(1) on non-cancel
errors), and only call cancel() after you’ve classified and acted on the error
(or don’t call cancel() at this location at all). Reference: cancel(),
ctx.Err(), scanner.Err(), resp.Body.Close(), proxy.setStatus(notready),
proxy.incFail.

Comment on lines +343 to +353
const maxBodySize = 10 << 20 // 10MB
var bodyBytes []byte
if r.Body != nil {
var err error
bodyBytes, err = io.ReadAll(io.LimitReader(r.Body, maxBodySize))
if err != nil {
http.Error(w, "failed to read request body", http.StatusBadGateway)
return
}
r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject oversized bodies instead of silently truncating them.

io.ReadAll(io.LimitReader(r.Body, maxBodySize)) clips anything above 10MB without reporting overflow. Larger requests then get replayed with a shortened payload and stale ContentLength, which can corrupt the request or trigger transport errors.

Suggested fix
 	const maxBodySize = 10 << 20 // 10MB
 	var bodyBytes []byte
 	if r.Body != nil {
 		var err error
-		bodyBytes, err = io.ReadAll(io.LimitReader(r.Body, maxBodySize))
+		bodyBytes, err = io.ReadAll(io.LimitReader(r.Body, maxBodySize+1))
 		if err != nil {
-			http.Error(w, "failed to read request body", http.StatusBadGateway)
+			http.Error(w, "failed to read request body", http.StatusBadRequest)
 			return
 		}
+		if len(bodyBytes) > maxBodySize {
+			http.Error(w, "request body too large to retry", http.StatusRequestEntityTooLarge)
+			return
+		}
+		r.ContentLength = int64(len(bodyBytes))
 		r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 343 - 353, Replace the silent
truncation by detecting overflow: read up to maxBodySize+1 bytes (e.g., use
io.LimitReader with maxBodySize+1 or read into a buffer and check len) instead
of maxBodySize; if the returned bodyBytes length is greater than maxBodySize,
respond with http.Error(w, "request body too large",
http.StatusRequestEntityTooLarge) and return; otherwise set r.Body =
io.NopCloser(bytes.NewReader(bodyBytes)) and update r.ContentLength =
int64(len(bodyBytes)) so downstream code/transport sees the correct length.
Ensure you reference the existing maxBodySize, bodyBytes, r.Body and the http
error status constants (StatusRequestEntityTooLarge).

Comment on lines +364 to +381
ticker := time.NewTicker(250 * time.Millisecond)
timeout, cancel := context.WithTimeout(context.Background(), time.Duration(*flagTimeout)*time.Second)
defer cancel()
for {
select {
case <-timeout.Done():
ticker.Stop()
slog.Info("timeout waiting for upstream to be ready")
http.Error(w, "timeout", http.StatusRequestTimeout)
return
case <-ticker.C:
if p.getStatus() == ready {
ticker.Stop()
if bodyBytes != nil {
r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
}
p.upstreamProxy.ServeHTTP(w, r)
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't auto-replay arbitrary requests after a transport error.

This loop uses context.Background() and retries every method once ready flips back. A transport failure here does not prove the upstream never processed the first attempt, so replaying POST/PUT/PATCH/DELETE can duplicate side effects after a transient network break. If the client disconnects, we'll still wait and resend; and the second ServeHTTP call is not wrapped, so another API-path failure can fall through without an explicit upstream error response.

Tie the wait to r.Context(), and only replay safe/idempotent or explicitly idempotent requests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 364 - 381, Replace the use of
context.Background() with the request's context and avoid blind replay of
mutating requests: create the timeout from r.Context() (e.g., ctx, cancel :=
context.WithTimeout(r.Context(), time.Duration(*flagTimeout)*time.Second)) and
use ctx.Done() in the select so we stop if the client disconnects; before
resubmitting the request via p.upstreamProxy.ServeHTTP(w, r), only retry when
r.Method is safe/idempotent (allow GET, HEAD, OPTIONS, TRACE) or for otherwise
idempotent methods (PUT/DELETE/PATCH) only when the request supplies an explicit
idempotency guarantee (e.g., presence of an "Idempotency-Key" header); preserve
resetting r.Body from bodyBytes as before and ensure ticker.Stop()/cancel() are
called on all paths.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
cmd/wol-proxy/wol-proxy.go (3)

410-426: ⚠️ Potential issue | 🔴 Critical

Don't auto-replay arbitrary requests after a transport error.

A transport failure with no response written does not prove the upstream skipped the first attempt. Retrying POST/PATCH/DELETE here can duplicate side effects, and Line 411 uses context.Background() so the wait keeps running after the client disconnects. The retry should be tied to r.Context(), limited to safe/idempotent requests (or explicit idempotency guarantees), and the second ServeHTTP call should use the same wrapped writer so another API-path failure can't fall through without an explicit upstream error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 410 - 426, The current loop uses
context.Background() and blindly retries requests (replaying bodyBytes and
calling p.upstreamProxy.ServeHTTP) which can duplicate unsafe side effects;
change the timeout/context to use r.Context() so the wait cancels when the
client disconnects, only retry when the request method is safe/idempotent (e.g.,
GET, HEAD, OPTIONS, PUT if you accept idempotent PUT) or when an explicit
idempotency token is present, and ensure the second attempt reuses the same
ResponseWriter wrapper (so failures from the upstream are detected and not
silently retried). Concretely: replace context.Background() with r.Context()
when creating timeout, gate the retry on r.Method (or idempotency header) before
restoring r.Body from bodyBytes, and pass the same wrapped writer instance into
p.upstreamProxy.ServeHTTP so an upstream error is observed and handled instead
of falling through.

389-399: ⚠️ Potential issue | 🟠 Major

Reject oversized bodies instead of truncating them.

io.ReadAll(io.LimitReader(r.Body, maxBodySize)) clips anything above 10 MiB without reporting overflow. The replay path then forwards a shortened body while keeping the original ContentLength, which can corrupt the upstream request.

🐛 Suggested change
 	const maxBodySize = 10 << 20 // 10MB
 	var bodyBytes []byte
 	if r.Body != nil {
 		var err error
-		bodyBytes, err = io.ReadAll(io.LimitReader(r.Body, maxBodySize))
+		bodyBytes, err = io.ReadAll(io.LimitReader(r.Body, maxBodySize+1))
 		if err != nil {
-			http.Error(w, "failed to read request body", http.StatusBadGateway)
+			http.Error(w, "failed to read request body", http.StatusBadRequest)
 			return
 		}
+		if len(bodyBytes) > maxBodySize {
+			http.Error(w, "request body too large to retry", http.StatusRequestEntityTooLarge)
+			return
+		}
+		r.ContentLength = int64(len(bodyBytes))
 		r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 389 - 399, The current read uses
io.ReadAll(io.LimitReader(r.Body, maxBodySize)) which silently truncates bodies
over maxBodySize; change the logic to read up to maxBodySize+1 bytes (use
io.LimitReader with maxBodySize+1), detect if more than maxBodySize was read and
immediately reject with HTTP 413 (Request Entity Too Large), and only if within
limit replace r.Body with io.NopCloser(bytes.NewReader(bodyBytes)) and set any
relevant ContentLength to int64(len(bodyBytes)); refer to maxBodySize,
bodyBytes, io.LimitReader and r.Body to locate and update the code.

265-270: ⚠️ Potential issue | 🟠 Major

Classify SSE failures before calling cancel().

Line 265 and Line 315 call cancel() before inspecting ctx.Err(). That makes client.Do errors, EOF, and scanner failures look like context.Canceled, so the notready/failure path here is effectively skipped and the proxy can stay ready after the upstream is gone.

Also applies to: 315-331

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 265 - 270, The SSE reconnection
logic currently calls cancel() before classifying the failure, causing all
errors (client.Do errors, EOF, scanner failures) to appear as context.Canceled
and skip the not-ready/failure handling; change the flow so you first inspect
the actual error returned by client.Do / scanner (and ctx.Err()) and only call
cancel() when the context was actually canceled or deadline exceeded, otherwise
treat the error as an upstream failure: log the real error (from client.Do or
scanner.Err()), mark the proxy as not-ready/failure, trigger any
reconnect/backoff logic, and then call cancel() if and only if ctx.Err()
indicates cancellation; apply the same change to the two SSE blocks that
currently call cancel() prematurely (the blocks around cancel(), ctx.Err(),
client.Do and scanner usage).
🧹 Nitpick comments (1)
cmd/wol-proxy/wol-proxy.go (1)

176-186: Clone http.DefaultTransport before overriding these knobs.

Starting from a zero-value http.Transport drops stdlib defaults like ProxyFromEnvironment, ForceAttemptHTTP2, and ExpectContinueTimeout. That can change outbound behavior for HTTPS or proxy-configured deployments even though this change only needs tighter timeouts/pooling. Please verify this against the Go version used in CI, since these defaults are stdlib-version specific.

♻️ Suggested change
-	p.Transport = &http.Transport{
-		DialContext: (&net.Dialer{
-			Timeout:   10 * time.Second,
-			KeepAlive: 30 * time.Second,
-		}).DialContext,
-		TLSHandshakeTimeout:   10 * time.Second,
-		ResponseHeaderTimeout: 30 * time.Second,
-		MaxIdleConns:          50,
-		MaxIdleConnsPerHost:   10,
-		IdleConnTimeout:       90 * time.Second,
-	}
+	transport := http.DefaultTransport.(*http.Transport).Clone()
+	transport.DialContext = (&net.Dialer{
+		Timeout:   10 * time.Second,
+		KeepAlive: 30 * time.Second,
+	}).DialContext
+	transport.TLSHandshakeTimeout = 10 * time.Second
+	transport.ResponseHeaderTimeout = 30 * time.Second
+	transport.MaxIdleConns = 50
+	transport.MaxIdleConnsPerHost = 10
+	transport.IdleConnTimeout = 90 * time.Second
+	p.Transport = transport
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/wol-proxy/wol-proxy.go` around lines 176 - 186, The custom transport
assigned to p.Transport currently builds a zero-value http.Transport which drops
stdlib defaults like ProxyFromEnvironment and ForceAttemptHTTP2; change the
assignment to start from a clone of the default transport (use
http.DefaultTransport.(*http.Transport).Clone()/copy of that struct) and then
only override the timeout/pooling fields (e.g., TLSHandshakeTimeout,
ResponseHeaderTimeout, MaxIdleConns, MaxIdleConnsPerHost, IdleConnTimeout,
DialContext), leaving default behaviors intact when setting p.Transport.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cmd/wol-proxy/wol-proxy.go`:
- Around line 410-426: The current loop uses context.Background() and blindly
retries requests (replaying bodyBytes and calling p.upstreamProxy.ServeHTTP)
which can duplicate unsafe side effects; change the timeout/context to use
r.Context() so the wait cancels when the client disconnects, only retry when the
request method is safe/idempotent (e.g., GET, HEAD, OPTIONS, PUT if you accept
idempotent PUT) or when an explicit idempotency token is present, and ensure the
second attempt reuses the same ResponseWriter wrapper (so failures from the
upstream are detected and not silently retried). Concretely: replace
context.Background() with r.Context() when creating timeout, gate the retry on
r.Method (or idempotency header) before restoring r.Body from bodyBytes, and
pass the same wrapped writer instance into p.upstreamProxy.ServeHTTP so an
upstream error is observed and handled instead of falling through.
- Around line 389-399: The current read uses io.ReadAll(io.LimitReader(r.Body,
maxBodySize)) which silently truncates bodies over maxBodySize; change the logic
to read up to maxBodySize+1 bytes (use io.LimitReader with maxBodySize+1),
detect if more than maxBodySize was read and immediately reject with HTTP 413
(Request Entity Too Large), and only if within limit replace r.Body with
io.NopCloser(bytes.NewReader(bodyBytes)) and set any relevant ContentLength to
int64(len(bodyBytes)); refer to maxBodySize, bodyBytes, io.LimitReader and
r.Body to locate and update the code.
- Around line 265-270: The SSE reconnection logic currently calls cancel()
before classifying the failure, causing all errors (client.Do errors, EOF,
scanner failures) to appear as context.Canceled and skip the not-ready/failure
handling; change the flow so you first inspect the actual error returned by
client.Do / scanner (and ctx.Err()) and only call cancel() when the context was
actually canceled or deadline exceeded, otherwise treat the error as an upstream
failure: log the real error (from client.Do or scanner.Err()), mark the proxy as
not-ready/failure, trigger any reconnect/backoff logic, and then call cancel()
if and only if ctx.Err() indicates cancellation; apply the same change to the
two SSE blocks that currently call cancel() prematurely (the blocks around
cancel(), ctx.Err(), client.Do and scanner usage).

---

Nitpick comments:
In `@cmd/wol-proxy/wol-proxy.go`:
- Around line 176-186: The custom transport assigned to p.Transport currently
builds a zero-value http.Transport which drops stdlib defaults like
ProxyFromEnvironment and ForceAttemptHTTP2; change the assignment to start from
a clone of the default transport (use
http.DefaultTransport.(*http.Transport).Clone()/copy of that struct) and then
only override the timeout/pooling fields (e.g., TLSHandshakeTimeout,
ResponseHeaderTimeout, MaxIdleConns, MaxIdleConnsPerHost, IdleConnTimeout,
DialContext), leaving default behaviors intact when setting p.Transport.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e22dc14d-2ab3-4456-b9b6-d4f60c614856

📥 Commits

Reviewing files that changed from the base of the PR and between f17905c and 40dd6ef.

📒 Files selected for processing (3)
  • cmd/wol-proxy/README.md
  • cmd/wol-proxy/wol-proxy.go
  • cmd/wol-proxy/wol-proxy_test.go
✅ Files skipped from review due to trivial changes (1)
  • cmd/wol-proxy/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/wol-proxy/wol-proxy_test.go

Slow upstreams can take longer than 30 seconds to send the first headers while ingesting large prompts, so avoid aborting those requests prematurely.
Build and push the Wake-on-LAN proxy image to GHCR on main with commit-based tags so deployments can reference immutable package versions.
Previously -source-ip only bound the WoL magic packet socket; HTTP
dials to the upstream (both the SSE client and the reverse proxy) were
left unbound, so the kernel picked the source IP from the main routing
table. In setups that rely on source-based routing (e.g. Multus + sbr
on Kubernetes) this caused proxied requests to egress via the wrong
interface while WoL still worked.

Apply -source-ip to the shared net.Dialer.LocalAddr so every outbound
connection originates from the configured address. Unset (default)
behavior is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants