-
Notifications
You must be signed in to change notification settings - Fork 37
fix: Increase K8s client rate limits for parallel SSAR checks #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Increase QPS from 5 to 100 and Burst from 10 to 200 - This fixes client-side throttling during parallel SSAR checks - Performance improvement: 8.2s -> 50ms for 51 namespaces (~164x faster) The default Kubernetes client rate limits (QPS=5, Burst=10) cause significant throttling when performing parallel SSAR checks for project listing. This was identified via timing logs showing 'client-side throttling' delays of ~2s per batch.
Claude Code ReviewSummaryThis PR increases Kubernetes client rate limits (QPS: 5→100, Burst: 10→200) to fix performance bottlenecks discovered after PR #428 added parallel SSAR checks. The change addresses client-side throttling that was causing 2-second delays during project listing operations. Performance Impact: Testing shows ~164x improvement (8.2s → 50ms for 51 namespaces) Issues by Severity🟡 Major Issues1. User-scoped clients don't inherit rate limit settings Location: The rate limits are only set on cfg := *BaseKubeConfig // Shallow copy
cfg.BearerToken = token
// QPS and Burst ARE copied here, so this is actually fineStatus: ✅ Actually OK - The shallow copy of 2. No rate limit configuration for operator clients Location: The operator likely uses similar client initialization patterns. If it performs parallel operations (e.g., watching multiple CRs, spawning Jobs), it may also benefit from increased rate limits. Recommendation: Review operator client initialization at 3. Rate limits are hardcoded rather than configurable Location: config.QPS = 100
config.Burst = 200Issue: No way to tune these values without code changes. For different deployment sizes (10 vs 1000 namespaces), optimal values may differ. Recommendation: Make rate limits configurable via environment variables: qps := getEnvFloat32("K8S_CLIENT_QPS", 100.0)
burst := getEnvInt("K8S_CLIENT_BURST", 200)
config.QPS = qps
config.Burst = int(burst)This allows operators to tune performance without redeployment. 🔵 Minor Issues4. Comment could be more specific about the issue Location: Current comment: // Increase rate limits for parallel operations (default is QPS=5, Burst=10)
// This is needed for parallel SSAR checks when listing projectsSuggestion: Add context about the problem being solved: // Increase rate limits for parallel operations (default is QPS=5, Burst=10)
// This is needed for parallel SSAR checks when listing projects (PR #428)
// which spawns 10 concurrent goroutines. Default limits caused 2s throttling
// delays. Testing shows 164x performance improvement with these values.5. No unit tests for rate limit configuration Location: Issue: No automated validation that rate limits are actually set correctly. Recommendation: Add a test: func TestInitK8sClientsRateLimits(t *testing.T) {
// Mock K8s config and verify QPS/Burst are set
// Verify user-scoped clients inherit settings
}Positive Highlights✅ Excellent problem identification - Clear documentation of the issue with timing logs Architecture Alignment✅ Follows backend patterns - Changes made in the correct location ( Performance ConsiderationsCurrent implementation is sound for the described use case:
Potential future concerns:
Testing Validation✅ Tested on real cluster - Dev cluster with 51 namespaces
Recommendations (Prioritized)High Priority (Address before merge)None - code is safe to merge as-is. Medium Priority (Consider for follow-up PR)
Low Priority (Future improvements)
Security Review✅ No security concerns - Rate limits are a performance optimization ConclusionRecommendation: ✅ APPROVE This is a well-targeted fix for a real performance issue. The change is minimal, well-documented, and the performance improvement is substantial. While there are suggestions for making rate limits configurable, the current hardcoded values are reasonable and don't block merging. The code follows established patterns in the codebase and doesn't introduce any security or correctness issues. User-scoped clients inherit the rate limit settings via config copy, so all client types benefit from the fix. Review performed by Claude Code using repository standards from CLAUDE.md, backend-development.md, security-standards.md, and k8s-client-usage.md patterns. 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
Summary
This PR fixes a critical performance issue discovered after merging #428 (pagination and parallel SSAR checks).
Problem
The default Kubernetes client rate limits (QPS=5, Burst=10) were causing significant throttling during parallel SSAR checks:
Solution
Increase rate limits:
Performance Impact
Tested with 51 namespaces:
Testing
Verified on dev cluster with 51 managed namespaces. Timing logs confirmed SSAR checks completed in 30ms with the increased rate limits.