-
Couldn't load subscription status.
- Fork 978
feat: retry on 429 #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat: retry on 429 #2244
Conversation
WalkthroughAdds configurable HTTP 429 retry support: new Options fields and CLI flags with validation; introduces a retryJob type, retry channel, and retryLoop with scheduling and atomic draining; propagates retry channel through processing (including nested probes); updates Runner method signatures and adds tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User as RunEnumeration
participant W as Worker/process
participant RL as RetryLoop
participant S as HTTP Server
participant O as Output
rect rgba(230,245,255,0.6)
User->>W: process(target, retryCh)
W->>S: analyze request
S-->>W: response (200 / 429)
alt 429 and attempts < RetryRounds
W->>RL: enqueue retryJob (when = now + RetryDelay)
end
W-->>O: emit result
end
rect rgba(240,255,240,0.6)
loop until drained
RL-->>RL: wait until(job.when)
RL->>S: analyze retry request
S-->>RL: response
RL-->>O: emit retry result
alt 429 and attempts < RetryRounds
RL->>RL: re-enqueue retryJob
end
end
end
note over User,RL: Drained signaled when initial + retry jobs complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
a4ffbdd to
9936878
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
runner/runner.go (1)
1354-1355: Consider documenting the retry attempt tracking logic.The condition
if job.attempt == 1adds to the remaining counter only for first attempts. While this is correct for tracking initial jobs vs retries, it might benefit from a clarifying comment.Add a comment to clarify the tracking logic:
+ // Track only initial attempts (attempt=1) to know when all original jobs are done if job.attempt == 1 { remaining.Add(1) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
runner/options.go(3 hunks)runner/runner.go(9 hunks)runner/runner_test.go(3 hunks)runner/types.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
runner/options.go (2)
runner/runner.go (1)
New(114-389)common/httpx/httpx.go (1)
New(46-213)
runner/types.go (3)
common/httpx/httpx.go (1)
HTTPX(33-43)common/httpx/types.go (1)
Target(4-8)runner/options.go (1)
ScanOptions(53-110)
runner/runner_test.go (4)
runner/runner.go (1)
New(114-389)runner/options.go (1)
Options(173-353)runner/types.go (1)
Result(35-103)common/httpx/http2.go (1)
HTTP(15-15)
runner/runner.go (4)
common/httpx/httpx.go (1)
HTTPX(33-43)common/httpx/types.go (1)
Target(4-8)runner/options.go (1)
ScanOptions(53-110)runner/types.go (1)
Result(35-103)
🔇 Additional comments (11)
runner/types.go (1)
123-132: LGTM! Well-structured retry state management type.The
retryJobstruct appropriately captures all necessary state for implementing HTTP 429 retry logic. The fields are properly organized and the use oftime.Timefor scheduling is appropriate.runner/options.go (2)
277-278: Good addition of retry configuration fields.The new
RetryRoundsandRetryDelayfields are appropriately typed and well-positioned within the Options struct alongside other rate-limiting related configurations.
535-536: Clear and well-documented CLI flags for retry configuration.The flags are properly grouped under "Optimizations" with clear descriptions. The default delay of 500ms is reasonable for HTTP 429 retry scenarios.
runner/runner_test.go (1)
322-409: Well-designed test for concurrent retry behavior.The test thoroughly validates the retry mechanism with proper concurrent execution handling and accurate result counting. The test setup with two servers having different 429 response patterns effectively covers the retry logic edge cases.
runner/runner.go (7)
1261-1263: Good channel and goroutine management for retry mechanism.The retry channel creation and retry loop initialization are properly placed before the main processing begins.
1306-1308: Proper synchronization with drain channel.The conditional wait on the drain channel when
RetryRounds > 0ensures all retry attempts complete before closing the output channel. This prevents premature termination.
1333-1393: Well-implemented retry loop with proper cancellation and draining.The retry loop implementation is robust with:
- Proper context cancellation support
- Atomic counter for tracking in-flight jobs
- Timer-based scheduling with cleanup
- Correct drain signaling when all work completes
1481-1493: Correct retry job scheduling for 429 responses.The retry job creation properly captures all necessary state and schedules with the configured delay. The check for
RetryRounds > 0ensures retries only happen when configured.
1499-1514: Good propagation of retry channel through TLS/CSP probe paths.The retry channel is correctly passed through all recursive
r.processcalls in both TLS and CSP probing paths, ensuring consistent retry behavior across all request types.
1548-1560: Consistent retry handling for custom port requests.The retry logic for custom port requests mirrors the default port handling, maintaining consistency across different request paths.
1566-1570: Proper retry channel propagation in custom port TLS probing.The retry channel is correctly passed through the TLS probe paths for custom ports.
9936878 to
230afce
Compare
230afce to
7fe6a48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
runner/runner_test.go (4)
325-344: Make the server conditions more explicit (< vs !=) for readabilityUsing “not equal to N” works, but “less than N” communicates intent (first N-1 calls) more clearly and avoids edge-case surprises if the counter increments beyond the sentinel.
Apply this diff to clarify:
- if atomic.AddInt32(&hits1, 1) != 4 { + if atomic.AddInt32(&hits1, 1) < 4 { w.WriteHeader(http.StatusTooManyRequests) return } @@ - if atomic.AddInt32(&hits2, 1) != 3 { + if atomic.AddInt32(&hits2, 1) < 3 { w.WriteHeader(http.StatusTooManyRequests) return }
345-351: Tiny nit: document RetryDelay units to avoid future confusionOptions.RetryDelay is treated as milliseconds by the new retry loop. Consider a short comment to prevent accidental second-based assumptions during future edits.
Example:
r, err := New(&Options{ Threads: 1, RetryRounds: 2, - RetryDelay: 5, + RetryDelay: 5, // milliseconds Timeout: 3, })
373-389: Compare by prefix rather than strict equality to make URL matching robustDepending on how
Result.URLis populated, it may include a trailing slash or path. Using strict equality withsrvX.URLrisks brittle matches.HasPrefixis sufficient here and resilient to minor formatting differences.Apply this diff:
- switch res.StatusCode { + switch res.StatusCode { case http.StatusTooManyRequests: - if res.URL == srv1.URL { + if strings.HasPrefix(res.URL, srv1.URL) { s1n429++ } else { s2n429++ } case http.StatusOK: - if res.URL == srv1.URL { + if strings.HasPrefix(res.URL, srv1.URL) { s1n200++ } else { s2n200++ } }
401-409: Strengthen assertions with server hit countsIn addition to output-based assertions, verify the exact number of HTTP calls made to each server. This catches cases where the output channel behavior changes but retries still occur (or vice versa).
Apply this diff:
// srv2: should have 2x 429 responses and 1x 200 (succeeds on 3rd attempt) require.Equal(t, 2, s2n429) require.Equal(t, 1, s2n200) + + // Also assert exact server invocations: + // srv1: 1 initial + 2 retries = 3 total + require.EqualValues(t, 3, atomic.LoadInt32(&hits1)) + // srv2: 1 initial + 2 retries = 3 total, ends with 200 on 3rd + require.EqualValues(t, 3, atomic.LoadInt32(&hits2))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
runner/options.go(3 hunks)runner/runner_test.go(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- runner/options.go
🧰 Additional context used
🧬 Code graph analysis (1)
runner/runner_test.go (3)
runner/runner.go (1)
New(114-389)runner/options.go (1)
Options(173-353)runner/types.go (1)
Result(35-103)
🔇 Additional comments (2)
runner/runner_test.go (2)
4-11: Imports look correct and minimalThe newly added imports are necessary for the test logic (servers, status codes, atomics, sync utils). No concerns.
Also applies to: 19-19
236-239: Formatting-only struct field alignmentThis change is cosmetic and does not affect behavior. Safe to keep.
| _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Don’t ignore the retry loop’s stop handle; avoid a goroutine leak
retryLoop appears to return a stop/cleanup handle (you currently discard it with _). If the loop isn’t explicitly stopped, the goroutine can remain blocked on retryCh after the test finishes, causing leaks and potential flakiness in suite runs.
Apply this diff to wire cancellation correctly and ensure cleanup even on early test failures:
- _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
+ ctx, cancel := context.WithCancel(context.Background())
+ stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
+ t.Cleanup(func() {
+ // Ensure the loop exits and resources are released
+ stopRetry()
+ cancel()
+ })If you want to double-check the signature and the semantics of the first return value:
🏁 Script executed:
#!/bin/bash
# Inspect retryLoop signature/returns to confirm cleanup expectations
rg -nP -C2 'func\s+\(r\s+\*Runner\)\s+retryLoop\(' runnerLength of output: 342
🏁 Script executed:
#!/bin/bash
# Retrieve the full signature (including return values) of retryLoop to confirm the first return value’s type and semantics
rg -nP -C10 'func\s+\(r\s+\*Runner\)\s+retryLoop\(' runner/runner.goLength of output: 691
🏁 Script executed:
#!/bin/bash
# Extract the full body of retryLoop to confirm how stop() and drainedCh operate
sed -n '1333,1400p' runner/runner.goLength of output: 1518
🏁 Script executed:
#!/bin/bash
# Locate usages of retryLoop in the test file
rg -n -C5 'retryLoop' runner/runner_test.go
# Show full context around lines 350–380 to inspect how stop and drainedCh are handled
sed -n '340,380p' runner/runner_test.goLength of output: 1276
Wire up the retryLoop stop handle to prevent goroutine leaks
The test currently invokes retryLoop with a background context and discards its first return value (stop func()), so the internal goroutine never sees a cancellation signal. This can leave the loop blocked on retryCh after the test ends, leading to leaked goroutines and flakiness.
• File: runner/runner_test.go
• Lines: 356–357
Suggested change:
- _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
+ // Create a cancellable context and capture the stop function
+ ctx, cancel := context.WithCancel(context.Background())
+ stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
+ // Ensure the loop exits and the context is cleaned up after the test
+ t.Cleanup(func() {
+ stopRetry()
+ cancel()
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze) | |
| // Create a cancellable context and capture the stop function | |
| ctx, cancel := context.WithCancel(context.Background()) | |
| stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze) | |
| // Ensure the loop exits and the context is cleaned up after the test | |
| t.Cleanup(func() { | |
| stopRetry() | |
| cancel() | |
| }) |
🤖 Prompt for AI Agents
In runner/runner_test.go around lines 356–357, the test calls
r.retryLoop(context.Background(), ...) but ignores the returned stop func,
leaving the internal goroutine unable to observe cancellation; capture the first
return value (stop func) when calling retryLoop and ensure you call it
(preferably via defer stop() immediately after the call) so the retryLoop is
signaled to exit and no goroutine is leaked.
| wg.Wait() | ||
| <-drainedCh | ||
| close(output) | ||
| drainWG.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Guard against hangs: time-bound the wait for drainedCh
If the retry loop logic regresses, <-drainedCh can block indefinitely and hang the test suite. Use a bounded wait to fail fast and surface the defect.
Apply this diff:
- wg.Wait()
- <-drainedCh
- close(output)
+ wg.Wait()
+ select {
+ case <-drainedCh:
+ // drained successfully
+ case <-time.After(5 * time.Second):
+ t.Fatalf("retry loop did not drain within timeout")
+ }
+ close(output)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wg.Wait() | |
| <-drainedCh | |
| close(output) | |
| drainWG.Wait() | |
| wg.Wait() | |
| select { | |
| case <-drainedCh: | |
| // drained successfully | |
| case <-time.After(5 * time.Second): | |
| t.Fatalf("retry loop did not drain within timeout") | |
| } | |
| close(output) | |
| drainWG.Wait() |
|
Nice one! I was thinking about something similar with automatic detection of rate limit from RL headers combined with https://github.com/projectdiscovery/ratelimit/tree/feat-per-key-limit to make the tool compliant with them without slowness. |
|
Thanks for the feedback! In my testing I was indeed sending many different paths to the same target, which resulted in 429 responses. However, in real-world environments with CDNs and WAFs, 429s can also occur sporadically across multiple targets, even with only a few requests per host. The if you feel additional tests or adjustments to the implementation would make this more useful or easier to maintain, I’d be happy to revise the PR accordingly! |
Summary
This PR introduces a round-robin retry mechanism for HTTP 429 (Too Many Requests) responses.
Users can control the number of retries and the delay between them using two new flags:
--retry-rounds
--retry-delay
Changes
Options
Added RetryRounds (int) → number of retries for 429 responses
Added RetryDelay (int) → delay between retries (in ms)
Added validation to ensure retries and delay values are valid
Runner
Integrated a new retryLoop with drainedCh synchronization
Introduced retryJob struct for retry queue management
On 429 responses, jobs are pushed into the retryCh for delayed execution
Used atomic.Int64 to track remaining retries and close drainedCh when done
Tests
Added TestRunner_Process_And_RetryLoop
srv1: Always returns 429 → retries exhausted, expected 3×429, 0×200
srv2: Returns 2×429 then 200 → expected 2×429, 1×200
Verified drain and retry logic
Example
httpx -l urls.txt --retry-rounds 2 --retry-delay 500
A request returning 429 will be retried up to 2 times
Each retry happens after 500ms delay
If retries are exhausted, the request is considered failed
Related Issue
#1078
Summary by CodeRabbit
New Features
Tests