Skip to content

3 Layer Fix across the entire affected surface#1058

Merged
Jay-tech456 merged 1 commit intomainfrom
investigate/investigate-499
Feb 5, 2026
Merged

3 Layer Fix across the entire affected surface#1058
Jay-tech456 merged 1 commit intomainfrom
investigate/investigate-499

Conversation

@Jay-tech456
Copy link
Contributor

@Jay-tech456 Jay-tech456 commented Feb 5, 2026

Applied a 3 - layer fix across the entire affected surface area. 1. … Robust Background refresh to keep Redis cache permanently warm. 2. Logic Optimization to replace sequential O(N) provider check with a single O(1) full-catalog fetch. 3. Non-Blocking I/O Safety net to guarantee that even if all else fails, we must hit the DB.

Summary by CodeRabbit

  • New Features

    • Added automatic periodic refresh of the model catalog to keep model information current
  • Performance

    • Optimized model detection to use a shared in-memory catalog instead of multiple database lookups
    • Converted model retrieval operations to asynchronous execution for improved responsiveness and non-blocking behavior
  • Infrastructure

    • Enhanced metrics collection with remote write capabilities
    • Configured metrics generation for distributed tracing system

Greptile Overview

Greptile Summary

This PR implements a comprehensive 3-layer performance optimization to prevent 499 timeout errors when the Redis cache expires during model catalog queries.

Changes

Layer 1 - Robust Background Refresh: Added a background task (src/services/background_tasks.py) that refreshes the full model catalog every 14 minutes (with 15min cache TTL), ensuring Redis cache stays permanently warm and preventing cache expiration during user requests.

Layer 2 - Logic Optimization: Replaced sequential O(N) provider lookups with a single O(1) full catalog fetch in src/routes/chat.py and src/routes/messages.py. Instead of checking each provider individually (get_cached_models(provider) for each provider), the code now fetches the entire catalog once and uses an in-memory set for instant lookups.

Layer 3 - Non-Blocking I/O: Wrapped all get_cached_models() calls with asyncio.to_thread() to prevent blocking the event loop during database fetches, ensuring the API remains responsive even during cold cache scenarios.

Observability: Added Prometheus remote write and Tempo metrics generator configuration for enhanced monitoring.

Issues Found

  • Minor style issue: asyncio imports should be moved to module level instead of inline (4 occurrences)

Confidence Score: 4/5

  • Safe to merge with comprehensive performance improvements
  • The implementation is well-architected with proper separation of concerns across the 3 layers. Background task uses correct async patterns with run_in_executor, and all blocking I/O is properly wrapped with asyncio.to_thread(). The only concerns are minor style issues with inline imports and potential monitoring needed for the background task's resource usage in production.
  • Monitor src/services/background_tasks.py in production to ensure the 14-minute refresh cycle doesn't cause unexpected load during peak traffic

Important Files Changed

Filename Overview
src/services/background_tasks.py Added background refresh loop to keep model catalog cache warm every 14 minutes, preventing cache expiration during user requests
src/services/startup.py Integrated model catalog refresh task into application startup/shutdown lifecycle
src/routes/chat.py Optimized provider detection from O(N) to O(1) using full catalog fetch with non-blocking I/O
src/routes/messages.py Applied same O(N) to O(1) optimization for provider detection in Anthropic messages endpoint

Sequence Diagram

sequenceDiagram
    participant User
    participant API as FastAPI Endpoint
    participant Cache as Redis Cache
    participant BG as Background Task
    participant DB as Database

    Note over BG,DB: Layer 1: Background Refresh (14min interval)
    BG->>DB: get_all_models_for_catalog() [run_in_executor]
    DB-->>BG: Return all models
    BG->>BG: transform_db_models_batch()
    BG->>Cache: cache_full_catalog(models, ttl=900s)
    Cache-->>BG: Cache updated

    Note over User,DB: Layer 2 & 3: Optimized Request Handling
    User->>API: POST /v1/chat/completions (model="unknown-model")
    
    Note over API: Layer 3: Non-Blocking I/O
    API->>Cache: asyncio.to_thread(get_cached_models, "all")
    
    alt Cache Hit (warm from background task)
        Cache-->>API: Return full catalog (11k+ models)
        Note over API: Layer 2: O(1) lookup
        API->>API: Build in-memory set {model_ids}
        API->>API: Check transformed in all_model_ids
        API-->>User: Route to correct provider
    else Cache Miss (cold start)
        Cache-->>API: None
        API->>DB: get_all_models_for_catalog() [asyncio.to_thread]
        DB-->>API: Return all models
        API->>API: transform_db_models_batch()
        API->>Cache: Update cache
        API->>API: Build in-memory set {model_ids}
        API->>API: Check transformed in all_model_ids
        API-->>User: Route to correct provider
    end

    Note over API: Before: O(N) sequential checks<br/>After: O(1) single fetch + set lookup
Loading

…Robust Background refresh to keep redis cache permanetly warm. 2. Logic Optimization to replace sequential O(N) provider check with a single O(1) full-catalog fetch. 3. Non-Blocking I/O Safety net to guarantee that even if all else fails and we must hit the DB.
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@supabase
Copy link

supabase bot commented Feb 5, 2026

This pull request has been ignored for the connected project ynleroehyrmaafkgjgmr because there are no changes detected in supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@Jay-tech456 Jay-tech456 merged commit 6a69a52 into main Feb 5, 2026
6 of 7 checks passed
@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The changes introduce a model catalog background refresh system with periodic DB fetches and in-memory caching to optimize provider detection across routes. Additionally, Prometheus remote write and Tempo metrics generator configurations are added for observability, while route optimizations replace per-provider cache lookups with shared in-memory catalog checks.

Changes

Cohort / File(s) Summary
Observability Infrastructure
prometheus.yml, tempo.yml
Added Prometheus remote write endpoint for Mimir metric collection and Tempo metrics generator configuration with span-metrics and service-graphs processors.
Background Catalog Refresh System
src/services/background_tasks.py, src/services/startup.py
Introduced asynchronous background task that periodically fetches and caches the full model catalog every 14 minutes with TTL overlap strategy. Integrated startup initialization and graceful shutdown logic.
Route Provider Detection Optimization
src/routes/api_models.py, src/routes/chat.py, src/routes/messages.py
Converted synchronous DB lookups to asynchronous thread-based calls and replaced per-provider cache checks with single in-memory catalog membership tests for improved performance and reduced event loop blocking.

Sequence Diagram(s)

sequenceDiagram
    participant Startup
    participant BackgroundTask as Background Task Service
    participant Cache as In-Memory Catalog
    participant DB as Database
    participant Route as Route Handler

    Startup->>BackgroundTask: start_model_catalog_refresh_task()
    BackgroundTask->>BackgroundTask: Initialize asyncio.Event & Task
    
    loop Every 14 minutes (or until stop signal)
        BackgroundTask->>DB: Fetch full model catalog
        DB-->>BackgroundTask: Return models
        BackgroundTask->>BackgroundTask: Transform & cache (TTL overlap)
        BackgroundTask->>Cache: Update in-memory catalog
        Cache-->>BackgroundTask: Cached
        BackgroundTask->>BackgroundTask: Wait with cancellation check
    end
    
    Route->>Cache: transformed_id in all_model_ids?
    Cache-->>Route: Quick membership test result
    Route->>Route: Determine provider
    
    Startup->>BackgroundTask: stop_model_catalog_refresh_task() [on shutdown]
    BackgroundTask->>BackgroundTask: Signal stop event & cancel task
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Whiskers twitching with delight,
Background loops refresh through the night,
Models cached, no DB delay,
In-memory lookups save the day!
Metrics flow to Mimir's keep,
While background tasks their vigil keep. 🌙

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch investigate/investigate-499

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

# Try each provider with transformation
# OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers.
# This prevents 499 errors caused by sequential DB fetches when cache is cold.
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move asyncio import to top of file

Suggested change
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch

Import asyncio at the module level instead of inline

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 2338:2338

Comment:
Move `asyncio` import to top of file

```suggestion
```

Import `asyncio` at the module level instead of inline

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

# Try each provider with transformation
# OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers.
# This prevents 499 errors caused by sequential DB fetches when cache is cold.
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move asyncio import to top of file (duplicate import)

Suggested change
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch

This is a duplicate inline import - consolidate at module level

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 3726:3726

Comment:
Move `asyncio` import to top of file (duplicate import)

```suggestion
```

This is a duplicate inline import - consolidate at module level

How can I resolve this? If you propose a fix, please make it concise.


# Try each provider with transformation
# OPTIMIZATION: Fetch full catalog once instead of making N calls for disjoint providers.
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move asyncio import to top of file

Suggested change
# CRITICAL FIX: Run in thread to avoid blocking event loop during DB fetch

Import asyncio at the module level

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/messages.py
Line: 492:492

Comment:
Move `asyncio` import to top of file

```suggestion
```

Import `asyncio` at the module level

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +146 to +147
# Get models from the detected gateway to derive providers
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move asyncio import to top of file

Suggested change
# Get models from the detected gateway to derive providers
import asyncio
gateway_models = await asyncio.to_thread(get_cached_models, detected_gateway)

Import asyncio at the module level and remove inline import

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/api_models.py
Line: 146:147

Comment:
Move `asyncio` import to top of file

```suggestion
            gateway_models = await asyncio.to_thread(get_cached_models, detected_gateway)
```

Import `asyncio` at the module level and remove inline import

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant