3 Layer Fix across the entire affected surface#1058
Conversation
…Robust Background refresh to keep redis cache permanetly warm. 2. Logic Optimization to replace sequential O(N) provider check with a single O(1) full-catalog fetch. 3. Non-Blocking I/O Safety net to guarantee that even if all else fails and we must hit the DB.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThe changes introduce a model catalog background refresh system with periodic DB fetches and in-memory caching to optimize provider detection across routes. Additionally, Prometheus remote write and Tempo metrics generator configurations are added for observability, while route optimizations replace per-provider cache lookups with shared in-memory catalog checks. Changes
Sequence Diagram(s)sequenceDiagram
participant Startup
participant BackgroundTask as Background Task Service
participant Cache as In-Memory Catalog
participant DB as Database
participant Route as Route Handler
Startup->>BackgroundTask: start_model_catalog_refresh_task()
BackgroundTask->>BackgroundTask: Initialize asyncio.Event & Task
loop Every 14 minutes (or until stop signal)
BackgroundTask->>DB: Fetch full model catalog
DB-->>BackgroundTask: Return models
BackgroundTask->>BackgroundTask: Transform & cache (TTL overlap)
BackgroundTask->>Cache: Update in-memory catalog
Cache-->>BackgroundTask: Cached
BackgroundTask->>BackgroundTask: Wait with cancellation check
end
Route->>Cache: transformed_id in all_model_ids?
Cache-->>Route: Quick membership test result
Route->>Route: Determine provider
Startup->>BackgroundTask: stop_model_catalog_refresh_task() [on shutdown]
BackgroundTask->>BackgroundTask: Signal stop event & cancel task
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
| # Try each provider with transformation | ||
| # OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers. | ||
| # This prevents 499 errors caused by sequential DB fetches when cache is cold. | ||
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
There was a problem hiding this comment.
Move asyncio import to top of file
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
Import asyncio at the module level instead of inline
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 2338:2338
Comment:
Move `asyncio` import to top of file
```suggestion
```
Import `asyncio` at the module level instead of inline
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Try each provider with transformation | ||
| # OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers. | ||
| # This prevents 499 errors caused by sequential DB fetches when cache is cold. | ||
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
There was a problem hiding this comment.
Move asyncio import to top of file (duplicate import)
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
This is a duplicate inline import - consolidate at module level
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 3726:3726
Comment:
Move `asyncio` import to top of file (duplicate import)
```suggestion
```
This is a duplicate inline import - consolidate at module level
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| # Try each provider with transformation | ||
| # OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers. | ||
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
There was a problem hiding this comment.
Move asyncio import to top of file
| # CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch |
Import asyncio at the module level
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/messages.py
Line: 492:492
Comment:
Move `asyncio` import to top of file
```suggestion
```
Import `asyncio` at the module level
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Get models from the detected gateway to derive providers | ||
| import asyncio |
There was a problem hiding this comment.
Move asyncio import to top of file
| # Get models from the detected gateway to derive providers | |
| import asyncio | |
| gateway_models = await asyncio.to_thread(get_cached_models, detected_gateway) |
Import asyncio at the module level and remove inline import
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/api_models.py
Line: 146:147
Comment:
Move `asyncio` import to top of file
```suggestion
gateway_models = await asyncio.to_thread(get_cached_models, detected_gateway)
```
Import `asyncio` at the module level and remove inline import
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Applied a 3 - layer fix across the entire affected surface area. 1. … Robust Background refresh to keep Redis cache permanently warm. 2. Logic Optimization to replace sequential O(N) provider check with a single O(1) full-catalog fetch. 3. Non-Blocking I/O Safety net to guarantee that even if all else fails, we must hit the DB.
Summary by CodeRabbit
New Features
Performance
Infrastructure
Greptile Overview
Greptile Summary
This PR implements a comprehensive 3-layer performance optimization to prevent 499 timeout errors when the Redis cache expires during model catalog queries.
Changes
Layer 1 - Robust Background Refresh: Added a background task (
src/services/background_tasks.py) that refreshes the full model catalog every 14 minutes (with 15min cache TTL), ensuring Redis cache stays permanently warm and preventing cache expiration during user requests.Layer 2 - Logic Optimization: Replaced sequential O(N) provider lookups with a single O(1) full catalog fetch in
src/routes/chat.pyandsrc/routes/messages.py. Instead of checking each provider individually (get_cached_models(provider)for each provider), the code now fetches the entire catalog once and uses an in-memory set for instant lookups.Layer 3 - Non-Blocking I/O: Wrapped all
get_cached_models()calls withasyncio.to_thread()to prevent blocking the event loop during database fetches, ensuring the API remains responsive even during cold cache scenarios.Observability: Added Prometheus remote write and Tempo metrics generator configuration for enhanced monitoring.
Issues Found
asyncioimports should be moved to module level instead of inline (4 occurrences)Confidence Score: 4/5
run_in_executor, and all blocking I/O is properly wrapped withasyncio.to_thread(). The only concerns are minor style issues with inline imports and potential monitoring needed for the background task's resource usage in production.src/services/background_tasks.pyin production to ensure the 14-minute refresh cycle doesn't cause unexpected load during peak trafficImportant Files Changed
Sequence Diagram
sequenceDiagram participant User participant API as FastAPI Endpoint participant Cache as Redis Cache participant BG as Background Task participant DB as Database Note over BG,DB: Layer 1: Background Refresh (14min interval) BG->>DB: get_all_models_for_catalog() [run_in_executor] DB-->>BG: Return all models BG->>BG: transform_db_models_batch() BG->>Cache: cache_full_catalog(models, ttl=900s) Cache-->>BG: Cache updated Note over User,DB: Layer 2 & 3: Optimized Request Handling User->>API: POST /v1/chat/completions (model="unknown-model") Note over API: Layer 3: Non-Blocking I/O API->>Cache: asyncio.to_thread(get_cached_models, "all") alt Cache Hit (warm from background task) Cache-->>API: Return full catalog (11k+ models) Note over API: Layer 2: O(1) lookup API->>API: Build in-memory set {model_ids} API->>API: Check transformed in all_model_ids API-->>User: Route to correct provider else Cache Miss (cold start) Cache-->>API: None API->>DB: get_all_models_for_catalog() [asyncio.to_thread] DB-->>API: Return all models API->>API: transform_db_models_batch() API->>Cache: Update cache API->>API: Build in-memory set {model_ids} API->>API: Check transformed in all_model_ids API-->>User: Route to correct provider end Note over API: Before: O(N) sequential checks<br/>After: O(1) single fetch + set lookup