Skip to content

Conversation

@Gkrumbach07
Copy link
Collaborator

Summary

This PR addresses slow backend response times for the project list and session list pages by implementing:

Backend Changes

  • Parallel SSAR checks: Project listing now uses a worker pool (10 workers) to perform SubjectAccessReview checks in parallel, significantly reducing load times
  • Server-side search: Both projects and sessions endpoints now support ?search= query parameter for filtering
  • Offset-based pagination: Both endpoints support ?limit= and ?offset= parameters with paginated responses

Frontend Changes

  • Search input: Added debounced search input (300ms) on both projects and sessions pages
  • Pagination controls: Previous/Next buttons with page info, shown only when needed (totalPages > 1)
  • API route fixes: Fixed Next.js API routes to properly forward query parameters to backend

API Response Format

Endpoints now return:

{
  "items": [...],
  "totalCount": 100,
  "limit": 20,
  "offset": 0,
  "hasMore": true,
  "nextOffset": 20
}

Performance Impact

  • Project listing with many namespaces is now significantly faster due to parallel SSAR checks
  • Reduced data transfer with pagination (only fetch what's needed)
  • Search filters applied server-side before pagination

Testing

- Backend: Implement parallel SSAR checks for projects listing
- Backend: Add pagination and search support to ListProjects and ListSessions
- Frontend: Add pagination controls and search input to projects page
- Frontend: Add pagination controls and search input to sessions section
- Add PaginationParams and PaginatedResponse types
The Next.js API routes for projects and sessions were not forwarding
query parameters to the backend, breaking pagination and search.

- Forward query string in /api/projects route
- Forward query string in /api/projects/[name]/agentic-sessions route
- Backend: Implement parallel SSAR checks using worker pool for faster project listing
- Backend: Add server-side search filtering for projects and sessions
- Backend: Add offset-based pagination for both endpoints
- Frontend: Add search input with debouncing for projects and sessions pages
- Frontend: Add pagination controls (Previous/Next) with page info
- Frontend: Fix Next.js API routes to forward query parameters to backend
- Add shared pagination types in backend (PaginationParams, PaginatedResponse)
- Add corresponding frontend types and React Query hooks

Performance improvement: SSAR checks now run in parallel with configurable
worker pool (default 10 workers), significantly reducing project list load times.
@github-actions

This comment has been minimized.

Copy link
Collaborator

@sallyom sallyom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but first fix GoLint

- Fix context cancellation in SSAR worker pool: workers now report cancellation
  so caller can detect timeouts and log warnings for partial results
- Replace bubble sort O(n²) with sort.Slice O(n log n) for both projects and
  sessions sorting - significant performance improvement for large lists
- Fix gofmt formatting in types/common.go
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

Claude Code Review

Summary

This PR adds pagination and search functionality to projects and sessions lists, addressing slow response times with large datasets. The implementation includes parallel SSAR checks, server-side search filtering, and offset-based pagination.

Overall Assessment: Strong implementation with good performance improvements. A few critical issues need attention before merge.


Issues by Severity

🚫 Blocker Issues

None - No blocking issues found.

🔴 Critical Issues

1. Type-Unsafe Unstructured Access (sessions.go:330)

Location: components/backend/handlers/sessions.go:330

session.Metadata = item.Object["metadata"].(map[string]interface{})

Issue: Direct type assertion without checking violates CLAUDE.md Critical Rule #4. This will panic if the type is wrong.

Required Fix (per .claude/patterns/error-handling.md):

if metadata, ok := item.Object["metadata"].(map[string]interface{}); ok {
    session.Metadata = metadata
} else {
    log.Printf("Warning: invalid metadata type for session %v", item.GetName())
    session.Metadata = make(map[string]interface{})
}

Reference: CLAUDE.md lines 362-365, backend-development.md lines 69-85


2. Goroutine Leak Risk in Worker Pool (projects.go:286-289)

Location: components/backend/handlers/projects.go:286-289

for range workChan {
    resultChan <- accessCheckResult{cancelled: true}
}

Issue: When context is cancelled, the worker drains the entire work channel, but this can block if the channel is large and no other workers are consuming. This could cause goroutine leaks.

Recommended Fix:

select {
case <-ctx.Done():
    resultChan <- accessCheckResult{
        namespace: ns,
        cancelled: true,
    }
    return  // Exit immediately, rely on wg.Wait() and close(resultChan)
default:
}

The sender will handle unconsumed work items by the channel closing.


🟡 Major Issues

3. Missing User Token for SSAR Checks

Location: components/backend/handlers/projects.go:202

Issue: The parallel SSAR checks use reqK8s (user token) which is correct per CLAUDE.md rules. However, there's no validation that reqK8s != nil before passing to performParallelSSARChecks. While checked at line 165, defensive programming suggests re-validating before expensive operations.

Recommended:

if reqK8s == nil {
    c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid or missing token"})
    return
}

Already present at line 165, but the function signature of performParallelSSARChecks should document this requirement.


4. Generic Error Messages Expose Internal Details (sessions.go:321)

Location: components/backend/handlers/sessions.go:321

c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to list agentic sessions"})

Issue: While this follows the pattern, the log message at line 320 is correct. Good adherence to error-handling.md.

No action needed - Just noting for completeness.


5. Frontend: Deprecated interface Usage

Location: Multiple frontend files (not shown in diff, but check if new types were added)

Issue: Per DESIGN_GUIDELINES.md, all type definitions must use type instead of interface.

Check Required: Verify components/frontend/src/types/api/common.ts, projects.ts, and sessions.ts use type (which they do ✅).

Status: ✅ Verified - all new types use type


🔵 Minor Issues

6. Magic Number for Worker Pool Size

Location: components/backend/handlers/projects.go:155

const parallelSSARWorkerCount = 10

Suggestion: Consider making this configurable via environment variable for tuning in production:

var parallelSSARWorkerCount = getEnvInt("SSAR_WORKER_COUNT", 10)

Priority: Low - current hardcoded value is reasonable.


7. Context Timeout Inconsistency

Location: projects.go:186 vs sessions.go:315

Both use 30-second timeouts, but projects.go has a comment explaining why. Sessions.go should have the same.

Suggestion:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) // Increased for search/sort operations

8. Search String Not Sanitized for Log Injection

Location: projects.go:231, sessions.go:376

searchLower := strings.ToLower(search)

Issue: If search contains newlines, could cause log injection. Per security-standards.md lines 112-117.

Fix:

search = strings.ReplaceAll(search, "\n", "")
search = strings.ReplaceAll(search, "\r", "")
searchLower := strings.ToLower(search)

9. Frontend: Missing Loading State on Pagination Buttons

Location: components/frontend/src/app/projects/page.tsx:272-293

Issue: Pagination buttons use disabled={isFetching} which is correct, but there's no visual loading indicator.

Enhancement:

<Button
  variant="outline"
  size="sm"
  onClick={handleNextPage}
  disabled={!hasMore || isFetching}
>
  {isFetching ? <Loader2 className="h-4 w-4 mr-1 animate-spin" /> : <ChevronRight className="h-4 w-4 mr-1" />}
  Next
</Button>

Priority: Nice-to-have (current implementation is acceptable).


10. Frontend: keepPreviousData Import

Location: components/frontend/src/services/queries/use-projects.ts:5

import { useMutation, useQuery, useQueryClient, keepPreviousData } from '@tanstack/react-query';

Good Practice: Using placeholderData: keepPreviousData at line 34 prevents UI flicker during pagination. Excellent adherence to react-query-usage.md patterns. ✅


Positive Highlights

🎯 Excellent Practices

  1. Parallel SSAR Checks: Worker pool pattern (projects.go:254-315) is well-implemented with proper goroutine management and context cancellation handling.

  2. Security Compliance:

    • ✅ User token authentication for SSAR checks (projects.go:165)
    • ✅ No tokens in logs
    • ✅ Proper RBAC enforcement
  3. Performance Improvements:

    • ✅ Replaced O(n²) bubble sort with O(n log n) sort.Slice (commit d6b6c4d)
    • ✅ Pre-filtering by search before SSAR checks (projects.go:199)
    • ✅ Efficient pagination with proper bounds checking
  4. Frontend Best Practices:

    • ✅ All types use type instead of interface
    • ✅ React Query with keepPreviousData for smooth UX
    • ✅ Debounced search (300ms) to reduce API calls
    • ✅ All UI uses Shadcn components
  5. Code Organization:

    • ✅ Clear separation of concerns (filter, sort, paginate functions)
    • ✅ Reusable pagination types in backend/types/common.go
    • ✅ Consistent naming conventions
  6. Error Handling:

    • ✅ Proper context timeout handling
    • ✅ Graceful degradation on SSAR timeout (logs warning, returns partial results)
    • ✅ Generic user-facing error messages

Recommendations

Priority 1 (Required Before Merge)

  1. Fix type-unsafe metadata access in sessions.go:330 (Critical Issue Outcome: Reduce Refinement Time with agent System #1)
  2. Simplify goroutine cleanup in projects.go:286-289 (Critical Issue Epic: RAT Architecture & Design #2)
  3. Add log injection sanitization for search strings (Minor Issue Test: Updated Workflow Validation #8)

Priority 2 (Recommended)

  1. Add inline comment explaining 30s timeout in sessions.go:315
  2. Consider making worker pool size configurable via env var

Priority 3 (Nice-to-Have)

  1. Add visual loading indicators to pagination buttons
  2. Add unit tests for new pagination/search functions

Testing Verification

Per the PR description, testing was performed on:

Recommendation: Add automated tests:

// components/backend/handlers/projects_test.go
func TestFilterNamespacesBySearch(t *testing.T) { ... }
func TestPaginateProjects(t *testing.T) { ... }
func TestSortProjectsByCreationTime(t *testing.T) { ... }

Pre-Merge Checklist

Based on CLAUDE.md Pre-Commit Checklist for Backend (lines 773-810):

  • Authentication: All user-facing endpoints use GetK8sClientsForRequest(c)
  • Authorization: RBAC checks performed (SSAR checks)
  • Error Handling: Errors logged with context, appropriate HTTP status codes
  • Token Security: No tokens in logs
  • Type Safety: ⚠️ Direct type assertion at sessions.go:330 needs fixing
  • Resource Cleanup: Not applicable (no new resources created)
  • Status Updates: Not applicable
  • Tests: ⚠️ No new tests added (recommended)
  • Logging: Structured logs with context
  • Code Quality: gofmt, go vet, golangci-lint (assumed from CI)

Summary

Verdict: Approve with minor required fixes (Critical Issues #1 and #2, Minor Issue #8).

This PR demonstrates strong understanding of the codebase patterns and security requirements. The parallel SSAR implementation is particularly well-done. Address the critical type-safety issue and the PR will be ready to merge.

Estimated Effort to Fix: ~30 minutes for the 3 required changes.


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@Gkrumbach07 Gkrumbach07 merged commit 8567f8a into ambient-code:main Dec 4, 2025
20 checks passed
Gkrumbach07 added a commit that referenced this pull request Dec 4, 2025
## Summary

This PR fixes a critical performance issue discovered after merging #428
(pagination and parallel SSAR checks).

### Problem

The default Kubernetes client rate limits (QPS=5, Burst=10) were causing
significant throttling during parallel SSAR checks:

```
"Waited before sending request" delay="1.997446758s" reason="client-side throttling"
```

### Solution

Increase rate limits:
- QPS: 5 → 100
- Burst: 10 → 200

### Performance Impact

Tested with 51 namespaces:
- **Before**: 8.2 seconds
- **After**: 50 milliseconds
- **Improvement**: ~164x faster

### Testing

Verified on dev cluster with 51 managed namespaces. Timing logs
confirmed SSAR checks completed in 30ms with the increased rate limits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants