Skip to content

Fix/fix rate limiter#1067

Closed
Jay-tech456 wants to merge 2 commits intomainfrom
fix/fix-rate-limiter
Closed

Fix/fix rate limiter#1067
Jay-tech456 wants to merge 2 commits intomainfrom
fix/fix-rate-limiter

Conversation

@Jay-tech456
Copy link
Contributor

@Jay-tech456 Jay-tech456 commented Feb 7, 2026

This pull request hardens the gateway’s security and performance by implementing a multi-layered behavioral protection system alongside critical asynchronous optimizations. Highlights include the introduction of Global Velocity Protection to mitigate DoS and IP-rotation attacks via request fingerprinting, and the resolution of a blocking runtime error in the Redis metrics service. By offloading synchronous database and pricing operations to thread pools across the chat completion path, these changes eliminate event loop starvation and significantly reduce the occurrence of 499 timeout errors during periods of high concurrency.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added advanced DoS protection with IP-based rate limiting (including datacenter detection), behavioral fingerprinting to identify bots, and adaptive velocity mode that activates during high error rates.
  • Improvements

    • Enhanced system responsiveness through asynchronous execution of blocking operations.
    • Improved connection error detection for more reliable service recovery.

Greptile Overview

Greptile Summary

  • Adds a new SecurityMiddleware that rate-limits by IP tiering + request fingerprinting and introduces a global “velocity mode” that tightens limits during elevated error rates.
  • Wires the security middleware into the FastAPI app early in the middleware chain (before timeout/observability), with a Redis-backed path and an in-memory fallback.
  • Offloads multiple synchronous operations to asyncio.to_thread across admin/catalog/chat routes, and introduces calculate_cost_async to avoid blocking cost/pricing calls.
  • Updates Redis metrics recording to use asyncio.to_thread for some Redis operations and fixes a missing asyncio import.

Confidence Score: 2/5

  • This PR is not safe to merge as-is due to a startup-breaking import and correctness issues in the new rate-limiting middleware and Redis metrics threading model.
  • Score reflects one definite runtime startup failure (get_async_redis_client ImportError) plus two functional issues that will affect production behavior (spoofable XFF rate-limits and local-cache cleanup resetting buckets) and a likely concurrency/thread-safety issue executing a Redis pipeline across threads.
  • src/main.py, src/middleware/security_middleware.py, src/services/redis_metrics.py

Important Files Changed

Filename Overview
src/config/supabase_config.py Adds an extra connection-indicator string ('bad file descriptor') to classify recoverable connection errors.
src/main.py Adds SecurityMiddleware early in the stack, but imports a non-existent get_async_redis_client() causing startup ImportError.
src/middleware/security_middleware.py Introduces new behavioral rate-limiting middleware (IP tiering, fingerprinting, velocity mode) but trusts spoofable X-Forwarded-For and has a local-cache cleanup bug that can reset other buckets.
src/routes/admin.py Moves several database/user operations onto threads via asyncio.to_thread to reduce event-loop blocking in admin endpoints.
src/routes/api_models.py Offloads fetch_specific_model (and related gateway model fetch) to a thread with asyncio.to_thread to avoid blocking in /api/models/detail.
src/routes/catalog.py Offloads cached model/provider reads to threads in several catalog endpoints to reduce event-loop blocking under load.
src/routes/chat.py Switches cost calculation to async (calculate_cost_async) and offloads more synchronous operations to threads in the chat completion paths.
src/services/pricing.py Adds async-friendly pricing/cost calculation paths using asyncio.to_thread for DB/cache lookups and introduces calculate_cost_async.
src/services/redis_metrics.py Fixes missing asyncio import and moves some Redis operations to asyncio.to_thread, but executes a pre-built pipeline in a worker thread which may violate redis client thread-safety.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FastAPI as FastAPI App
    participant Sec as SecurityMiddleware
    participant Timeout as RequestTimeoutMiddleware
    participant Route as Chat Route (/v1/chat/completions)
    participant Pricing as pricing.calculate_cost_async
    participant Redis as Redis (rate-limit/metrics)

    Client->>FastAPI: HTTP request
    FastAPI->>Sec: dispatch(request)
    Sec->>Sec: _get_client_ip() / _generate_fingerprint()
    Sec->>Redis: INCR sec_rl:ip:* + EXPIRE
    Sec->>Redis: INCR sec_rl:fp:* + EXPIRE
    alt over limit
        Sec-->>Client: 429 Too Many Requests
    else allowed
        Sec->>Timeout: call_next(request)
        Timeout->>Route: execute handler
        Route->>Pricing: await calculate_cost_async(model,tokens)
        Pricing-->>Route: cost
        Route-->>Timeout: response
        Timeout-->>Sec: response
        Sec->>Sec: _record_request_outcome(status)
        Sec->>Sec: _check_and_activate_velocity_mode()
        Sec-->>Client: response
    end
Loading

…ath.

Key improvements include:

Async-ified all critical operations: Cost calculation, anonymous request handling, routing, and activity logging now run asynchronously.
Offloaded blocking I/O: All synchronous database and Redis calls are now correctly offloaded to thread pools using asyncio.to_thread.
Improved Performance: Implemented caching for rate limit checks and optimized Redis metrics recording.
Added missing import asyncio to fix the NameError that would occur at runtime. 2. Implemented Global Velocity Mode in security_middleware.py
Added full implementation of the Global Velocity Protection feature
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@supabase
Copy link

supabase bot commented Feb 7, 2026

This pull request has been ignored for the connected project ynleroehyrmaafkgjgmr because there are no changes detected in supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive DoS protection via a new SecurityMiddleware class with tiered IP rate limiting, behavioral fingerprinting, and velocity mode activation. It integrates the middleware into the app startup with Redis-backed distributed support and in-memory fallback. Additionally, it converts synchronous database and cache operations to asynchronous execution using asyncio.to_thread across multiple routes and services to prevent event loop blocking. A connection error indicator is added to improve resilience.

Changes

Cohort / File(s) Summary
Security Middleware Infrastructure
src/middleware/security_middleware.py
New comprehensive DoS protection middleware implementing tiered IP rate limiting (60 RPM standard, 10 RPM datacenter), behavioral fingerprinting via User-Agent/Accept-Language/Encoding, and global velocity mode that reduces all limits by 50% when error rates exceed 10% over 60-second window. Supports Redis-backed distributed state with in-memory fallback, skips internal health/metrics endpoints, and emits structured logs and Prometheus metrics.
Middleware Integration & Connection Resilience
src/main.py, src/config/supabase_config.py
Integrates SecurityMiddleware early in middleware stack with conditional Redis initialization; falls back to in-memory security if Redis unavailable. Adds "bad file descriptor" to connection error detection for improved database connection resilience.
Async Route Conversions
src/routes/admin.py, src/routes/api_models.py, src/routes/catalog.py, src/routes/chat.py
Converts blocking database and cache calls (create_enhanced_user, get_user, fetch_specific_model, get_cached_providers/models) to async execution via asyncio.to_thread. Adds async cost calculation in chat completion paths to maintain non-blocking event loop behavior.
Async Service Conversions
src/services/pricing.py, src/services/redis_metrics.py
Wraps synchronous database/cache lookups and Redis operations (zscore, pipeline execution) with asyncio.to_thread to prevent event loop blocking during pricing lookups and health score updates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SecurityMiddleware
    participant Redis
    participant App
    
    Client->>SecurityMiddleware: HTTP Request
    
    alt Redis Available
        SecurityMiddleware->>Redis: Check IP rate limit<br/>(default: 60 RPM)
        Redis-->>SecurityMiddleware: Current count
    else No Redis
        SecurityMiddleware->>SecurityMiddleware: Check in-memory<br/>IP limit
    end
    
    alt IP Limit Exceeded
        SecurityMiddleware-->>Client: 429 Too Many Requests
    else IP Limit OK
        SecurityMiddleware->>SecurityMiddleware: Generate behavior<br/>fingerprint
        
        alt Redis Available
            SecurityMiddleware->>Redis: Check fingerprint limit<br/>(100 RPM)
            Redis-->>SecurityMiddleware: Current count
        else No Redis
            SecurityMiddleware->>SecurityMiddleware: Check in-memory<br/>fingerprint limit
        end
        
        alt Fingerprint Limit Exceeded
            SecurityMiddleware-->>Client: 429 Too Many Requests
        else Fingerprint OK
            SecurityMiddleware->>App: Forward Request
            App-->>SecurityMiddleware: Response
            SecurityMiddleware->>SecurityMiddleware: Record outcome<br/>& check velocity mode
            SecurityMiddleware-->>Client: HTTP Response
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰 A middleware hops in with a hop and a bound,
Rate limits and fingerprints all around!
With Redis for speed and fallbacks so keen,
Our threads now async—no blocking in between!
DoS attacks scatter as fast as we flee! 🚀

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix/fix rate limiter' is vague and generic, using a non-descriptive naming pattern that does not clearly convey what the changeset actually accomplishes. Consider a more specific title such as 'Implement velocity-based rate limiting with async optimizations' or 'Add behavioral DoS protection and async offloading' to better reflect the substantial changes in the PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/fix-rate-limiter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +200 to +203
# OPTIMIZED: Add security middleware first to block bots before heavy logic
from src.config.redis_config import get_async_redis_client
try:
redis_client = get_async_redis_client()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken Redis client import

create_app() imports and calls get_async_redis_client() (from src.config.redis_config import get_async_redis_client), but src/config/redis_config.py only exposes get_redis_client() (no async variant). This will raise ImportError during app startup, preventing the service from booting.

Fix by importing the correct function (or adding the async client factory in redis_config.py and updating callers accordingly).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/main.py
Line: 200:203

Comment:
**Broken Redis client import**

`create_app()` imports and calls `get_async_redis_client()` (`from src.config.redis_config import get_async_redis_client`), but `src/config/redis_config.py` only exposes `get_redis_client()` (no async variant). This will raise `ImportError` during app startup, preventing the service from booting.

Fix by importing the correct function (or adding the async client factory in `redis_config.py` and updating callers accordingly).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +67 to +72
async def _get_client_ip(self, request: Request) -> str:
"""Extract client IP with support for proxies."""
forwarded_for = request.headers.get("X-Forwarded-For")
if forwarded_for:
return forwarded_for.split(",")[0].strip()
return request.client.host if request.client else "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusts spoofable client IP

_get_client_ip() unconditionally trusts X-Forwarded-For and uses the first hop as the client IP. If this service is reachable directly (or behind a proxy that doesn’t strip/overwrite XFF), any client can spoof their IP to bypass per-IP limits and also poison the Redis keyspace with arbitrary ip:* buckets.

This should only honor XFF when you can verify the request came through a trusted proxy (or rely on Starlette/FastAPI proxy header middleware configured with known proxy IPs).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/middleware/security_middleware.py
Line: 67:72

Comment:
**Trusts spoofable client IP**

`_get_client_ip()` unconditionally trusts `X-Forwarded-For` and uses the first hop as the client IP. If this service is reachable directly (or behind a proxy that doesn’t strip/overwrite XFF), any client can spoof their IP to bypass per-IP limits and also poison the Redis keyspace with arbitrary `ip:*` buckets.

This should only honor XFF when you can verify the request came through a trusted proxy (or rely on Starlette/FastAPI proxy header middleware configured with known proxy IPs).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +193 to +202
# Local in-memory fallback
if full_key not in self._local_cache:
self._local_cache[full_key] = 0
# Periodically clean old keys
if now - self._last_cleanup > 300:
self._local_cache = {k: v for k, v in self._local_cache.items() if k.startswith(f"sec_rl:{key}:")}
self._last_cleanup = now

self._local_cache[full_key] += 1
return self._local_cache[full_key] <= limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local cache cleanup bug

In the local fallback, the periodic cleanup filters with k.startswith(f"sec_rl:{key}:"), which only retains keys for the current key and drops all other rate-limit buckets. When cleanup runs, it effectively resets limits for every other IP/fingerprint, allowing bursts until those buckets refill.

Cleanup should remove expired buckets across the whole cache (e.g., based on bucket timestamp), not wipe unrelated keys.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/middleware/security_middleware.py
Line: 193:202

Comment:
**Local cache cleanup bug**

In the local fallback, the periodic cleanup filters with `k.startswith(f"sec_rl:{key}:")`, which only retains keys for the *current* `key` and drops all other rate-limit buckets. When cleanup runs, it effectively resets limits for every other IP/fingerprint, allowing bursts until those buckets refill.

Cleanup should remove expired buckets across the whole cache (e.g., based on bucket timestamp), not wipe unrelated keys.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +156 to 158
# Execute all operations atomically in a thread to avoid blocking the event loop
await asyncio.to_thread(pipe.execute)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threaded pipeline breaks safety

record_request() builds a Redis pipe in the event loop thread, then executes it in a worker thread via await asyncio.to_thread(pipe.execute). The underlying redis client/connection objects (and pipeline state) are not guaranteed to be thread-safe when moved across threads; this can cause intermittent failures or corrupt pipeline execution under concurrency.

If you need to avoid blocking the loop, keep all redis interactions on the same thread (either use an async redis client for metrics, or run the entire metrics write path in to_thread, including pipeline creation).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/redis_metrics.py
Line: 156:158

Comment:
**Threaded pipeline breaks safety**

`record_request()` builds a Redis `pipe` in the event loop thread, then executes it in a worker thread via `await asyncio.to_thread(pipe.execute)`. The underlying redis client/connection objects (and pipeline state) are not guaranteed to be thread-safe when moved across threads; this can cause intermittent failures or corrupt pipeline execution under concurrency.

If you need to avoid blocking the loop, keep all redis interactions on the same thread (either use an async redis client for metrics, or run the entire metrics write path in `to_thread`, including pipeline creation).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +185 to +188
count = await self.redis.incr(full_key)
if count == 1:
await self.redis.expire(full_key, window * 2)
return count <= limit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The security middleware attempts to await synchronous Redis methods. An ImportError for a non-existent async client function is also silently caught, preventing Redis integration.
Severity: CRITICAL

Suggested Fix

Create a get_async_redis_client function that returns a true async Redis client. Alternatively, if using the synchronous client is intended, wrap all Redis calls within the middleware (e.g., self.redis.incr) with asyncio.to_thread() to prevent TypeError exceptions. The exception handling in main.py should also be made more specific to avoid masking import errors.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/middleware/security_middleware.py#L185-L188

Potential issue: The `SecurityMiddleware` is configured to use an async Redis client,
but the `get_async_redis_client` function it tries to import from
`src/config/redis_config.py` does not exist. This `ImportError` is silently caught by a
broad `except Exception` block, causing the middleware to fall back to in-memory mode
without warning. If this import were fixed to use the available synchronous
`get_redis_client`, the application would crash at runtime. The middleware uses `await`
on synchronous methods like `self.redis.incr()`, which are not awaitable and will raise
a `TypeError`.

Did we get this right? 👍 / 👎 to inform future reviews.

import src.services.trial_validation as trial_module
from src.services.model_transformations import detect_provider_from_model_id, transform_model_id
from src.services.pricing import calculate_cost
from src.services.pricing import calculate_cost, calculate_cost_async
import logging
import time
from collections import deque
from typing import Optional
from starlette.responses import JSONResponse
from starlette.types import ASGIApp

from src.config import Config
# Record metric for monitoring
try:
rate_limited_requests.labels(limit_type="velocity_mode_activated").inc()
except Exception:
@Jay-tech456 Jay-tech456 closed this Feb 7, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/services/pricing.py (1)

614-619: ⚠️ Potential issue | 🟠 Major

Race condition: missing _pricing_cache_lock when writing to _pricing_cache.

Step 4 (cache fallback) writes to _pricing_cache without acquiring the lock, unlike Step 3 (lines 600–605) and the synchronous get_model_pricing (lines 435–439) which both correctly use with _pricing_cache_lock. This is a data race on the shared dict.

Proposed fix
         cache_pricing = await asyncio.to_thread(_get_pricing_from_cache_fallback, model_id, candidate_ids)
             if cache_pricing:
                 # Cache the fallback result
-                _pricing_cache[model_id] = {
-                    "data": cache_pricing,
-                    "timestamp": time.time()
-                }
+                with _pricing_cache_lock:
+                    _pricing_cache[model_id] = {
+                        "data": cache_pricing,
+                        "timestamp": time.time()
+                    }
                 return cache_pricing
src/routes/admin.py (2)

114-119: ⚠️ Potential issue | 🟡 Minor

updated_user can be None, causing a TypeError on line 119.

get_user returns None when the user is not found. If the user is deleted between the add_credits_to_user call and the re-fetch, updated_user["credits"] will raise TypeError: 'NoneType' object is not subscriptable.

Suggested fix
         updated_user = await asyncio.to_thread(get_user, req.api_key)
+        if not updated_user:
+            raise HTTPException(status_code=404, detail="User not found after credit update")

         return {
             "status": "success",

188-209: ⚠️ Potential issue | 🟠 Major

Use asyncio.to_thread() to wrap the synchronous get_user_rate_limits call.

get_user_rate_limits (line 193) is a synchronous function that performs blocking Supabase database calls, but it's being called without await or asyncio.to_thread() in this async handler. This blocks the event loop and violates the guideline to use async/await for all I/O operations. The pattern already exists correctly in set_user_rate_limits (line 191), which wraps its synchronous operations with asyncio.to_thread().

Suggested fix
         await set_user_rate_limits(req.api_key, req.rate_limits.model_dump())

-        rate_limits = get_user_rate_limits(req.api_key)
+        rate_limits = await asyncio.to_thread(get_user_rate_limits, req.api_key)
🤖 Fix all issues with AI agents
In `@src/main.py`:
- Around line 200-209: The import of get_async_redis_client is incorrect —
replace the import and usage with the synchronous get_redis_client from the
redis_config module: change the imported symbol get_async_redis_client to
get_redis_client and call get_redis_client() to obtain redis_client before
calling app.add_middleware(SecurityMiddleware, redis_client=redis_client); keep
the existing try/except fallback that calls
app.add_middleware(SecurityMiddleware) when redis_client is None or an exception
occurs so the middleware initialization doesn't crash.

In `@src/middleware/security_middleware.py`:
- Around line 52-64: The middleware currently keeps state in-memory in __init__
via _request_log, _local_cache, _velocity_mode_until and so will diverge across
workers; refactor to make the middleware stateless by moving shared counters and
velocity state into Redis (use the redis_client parameter) or a dedicated
service class (e.g., VelocityService) and delegate all business logic
(sliding-window tracking, fallback limits, velocity-mode toggling) to that
service; add concrete type hints for redis_client, _local_cache, _request_log,
_velocity_mode_until, and _velocity_mode_triggered_count using Python 3.12+
typing syntax and ensure middleware only performs non-blocking calls to the
service (no in-process state mutations).
- Around line 193-199: The cleanup currently filters _local_cache to only keys
with the current prefix (full_key/key) which deletes counters for other clients;
change the strategy to evict stale entries by age instead: modify _local_cache
storage (or add a companion dict) so each entry records a last_seen timestamp
(e.g., store (count, last_seen) or maintain _local_cache_timestamps), update the
increment logic in the same code path to set last_seen = now when touching
full_key, and replace the cleanup block that references _last_cleanup with a
loop that removes entries whose last_seen < now - 300 (rather than keeping only
keys starting with f"sec_rl:{key}:"); keep using _last_cleanup to avoid running
the sweep too often.
- Around line 183-191: In the Redis exception handler inside the rate limiting
logic (the try/except that uses self.redis.incr(full_key) and returns count <=
limit) add a Sentry capture by importing capture_cache_error from
src.utils.auto_sentry and calling it with the caught exception,
operation="rate_limit_check", cache_type="redis", key=full_key, and details
containing {"limit": limit, "endpoint": key}; keep the existing logger.error
call and then call capture_cache_error(e, operation=..., cache_type=...,
key=full_key, details=...) before falling back to local limiting.
🧹 Nitpick comments (11)
src/routes/api_models.py (1)

116-116: get_cached_providers() is still called synchronously in this async handler.

Lines 100 and 148 were correctly wrapped with asyncio.to_thread, but get_cached_providers() on line 116 remains a synchronous blocking call in the same async handler. For consistency and to fully prevent event loop blocking, this should also be offloaded.

Proposed fix
-        openrouter_providers = get_cached_providers()
+        openrouter_providers = await asyncio.to_thread(get_cached_providers)
src/services/redis_metrics.py (1)

198-203: Other async methods still perform synchronous Redis I/O directly.

get_provider_health (line 199), get_recent_errors (line 221), get_hourly_stats (line 255), get_latency_percentiles (line 295), update_circuit_breaker (line 344), get_all_provider_health (line 362), and cleanup_old_data (lines 397/406) all call self.redis.* synchronously within async methods. While not part of this PR's changes, they'll block the event loop the same way record_request used to. Consider wrapping these in a follow-up.

src/routes/catalog.py (2)

989-1149: Individual provider fetches in get_models remain synchronous.

When gateway != "all" (or when the aggregated cache is empty), lines 992–1149 call get_cached_models(...) synchronously ~25 times. These are the same blocking calls that were wrapped with asyncio.to_thread on lines 563, 761, 782, and 952. Under load with a single-gateway request, these will still block the event loop.

Consider wrapping these in a follow-up pass for consistency—or, since they're behind a not (gateway_value == "all" and all_models_list) guard, document that the hot path (gateway=all) is covered and these are low-traffic fallbacks.


1610-1610: get_cached_providers wrapped, but get_cached_models on lines 1628 and 1649 are not.

In get_specific_model, get_cached_providers was wrapped in asyncio.to_thread (line 1610), but the two get_cached_models calls on lines 1628 and 1649 remain synchronous. This is inconsistent within the same handler.

src/routes/admin.py (6)

59-68: send_welcome_email is a synchronous call that can block the event loop.

This was not converted as part of this PR, but it sits on the same request path as the newly-async create_enhanced_user. If the email service is slow or times out, the event loop stalls.

Suggested fix
         try:
-            enhanced_notification_service.send_welcome_email(
+            await asyncio.to_thread(
+                enhanced_notification_service.send_welcome_email,
                 user_id=user_data["user_id"],
                 username=user_data["username"],
                 email=user_data["email"],
                 credits=user_data["credits"],
             )

316-318: fetch_huggingface_model is a synchronous HTTP call not offloaded to a thread.

This is an external HTTP fetch that can block the event loop, inconsistent with the other conversions in this PR.

Suggested fix
-        hf_data = fetch_huggingface_model(hugging_face_id)
+        hf_data = await asyncio.to_thread(fetch_huggingface_model, hugging_face_id)

462-470: get_trial_analytics is a synchronous DB call not offloaded.

Same pattern as the other conversions in this PR — should be wrapped for consistency. As per coding guidelines, **/*.py: "Use async/await for all I/O operations (database calls, HTTP requests, file operations) to maintain non-blocking execution and high concurrency."

Suggested fix
-        analytics = get_trial_analytics()
+        analytics = await asyncio.to_thread(get_trial_analytics)

1155-1171: get_all_transactions and get_transaction_summary (line 1216) are synchronous DB calls not offloaded.

These are blocking I/O operations in an async handler, consistent with the pattern this PR is fixing elsewhere. As per coding guidelines: "Use async/await for all I/O operations."

Suggested fix
-        transactions = get_all_transactions(
+        transactions = await asyncio.to_thread(
+            get_all_transactions,
             limit=limit,
             ...
         )

And similarly for get_transaction_summary at line 1216.


2267-2273: Redis get call is synchronous, blocking the event loop.

In get_chat_requests_summary_admin, the Redis client calls (redis_client.get, redis_client.setex) on lines 2267 and 2304 are synchronous. Since this PR's goal is to offload blocking I/O, these should also be wrapped or use an async Redis client.

Suggested fix
-                cached_data = redis_client.get(cache_key)
+                cached_data = await asyncio.to_thread(redis_client.get, cache_key)
-                redis_client.setex(
+                await asyncio.to_thread(
+                    redis_client.setex,
                     cache_key,
                     60,
                     json.dumps(response, default=str)
                 )

2279-2285: get_chat_completion_summary_by_filters is a synchronous DB call not offloaded.

Suggested fix
-        summary = get_chat_completion_summary_by_filters(
+        summary = await asyncio.to_thread(
+            get_chat_completion_summary_by_filters,
             model_id=model_id,
             ...
         )
src/routes/chat.py (1)

1073-1125: Avoid duplicate cost computation in anonymous streaming background path

calculate_cost_async is already awaited for tracing; the anonymous branch recomputes it again for metrics. Reuse the earlier cost to avoid extra pricing lookups and latency.

♻️ Suggested diff
-                cost = await calculate_cost_async(model, prompt_tokens, completion_tokens)
                 await _record_inference_metrics_and_health(
                     provider=provider,
                     model=model,
                     elapsed_seconds=elapsed,
                     prompt_tokens=prompt_tokens,
                     completion_tokens=completion_tokens,
                     cost=cost,
                     success=True,
                     error_message=None,
                 )

Comment on lines +200 to +209
# OPTIMIZED: Add security middleware first to block bots before heavy logic
from src.config.redis_config import get_async_redis_client
try:
redis_client = get_async_redis_client()
app.add_middleware(SecurityMiddleware, redis_client=redis_client)
logger.info(" 🛡️ Security middleware enabled (IP tiering & fingerprinting)")
except Exception as e:
# Fallback to in-memory limiting if redis is unavailable
app.add_middleware(SecurityMiddleware)
logger.warning(f" 🛡️ Security middleware enabled with LOCAL fallback (Redis error: {e})")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the signature and implementation of get_async_redis_client
rg -n "async def get_async_redis_client|def get_async_redis_client" src/config/redis_config.py -A 10

Repository: Alpaca-Network/gatewayz-backend

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Find redis_config.py and check its location
find . -name "redis_config.py" -type f 2>/dev/null

# Also search for get_async_redis_client across the codebase
rg "get_async_redis_client" -n 2>/dev/null | head -20

# List the config directory structure
ls -la src/config/ 2>/dev/null || echo "src/config/ not found"

Repository: Alpaca-Network/gatewayz-backend

Length of output: 1059


🏁 Script executed:

#!/bin/bash
# Read the redis_config.py file
cat -n src/config/redis_config.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 12547


🏁 Script executed:

#!/bin/bash
# Check the exact import in src/main.py around lines 200-209
sed -n '195,215p' src/main.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 1063


🏁 Script executed:

#!/bin/bash
# Check if there's another redis_config.py in src/
cat -n src/redis_config.py 2>/dev/null || echo "File does not exist or is empty"

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3514


Fix incorrect import: get_async_redis_client does not exist

The function being imported at line 201 does not exist in src.config.redis_config. The module only provides get_redis_client() (a synchronous function returning redis.Redis | None). Change the import to:

from src.config.redis_config import get_redis_client

And update line 203:

redis_client = get_redis_client()

This will raise an ImportError at runtime and cause the entire middleware initialization to fail before the exception handler can execute.

🤖 Prompt for AI Agents
In `@src/main.py` around lines 200 - 209, The import of get_async_redis_client is
incorrect — replace the import and usage with the synchronous get_redis_client
from the redis_config module: change the imported symbol get_async_redis_client
to get_redis_client and call get_redis_client() to obtain redis_client before
calling app.add_middleware(SecurityMiddleware, redis_client=redis_client); keep
the existing try/except fallback that calls
app.add_middleware(SecurityMiddleware) when redis_client is None or an exception
occurs so the middleware initialization doesn't crash.

Comment on lines +52 to +64
def __init__(self, app: ASGIApp, redis_client=None):
super().__init__(app)
self.redis = redis_client
# In-memory fallback if redis is missing
self._local_cache = {}
self._last_cleanup = time.time()

# Global Velocity Mode tracking
# Tracks (timestamp, is_error) tuples for sliding window analysis
self._request_log: deque = deque(maxlen=10000) # Cap memory usage
self._velocity_mode_until: float = 0 # Unix timestamp when velocity mode expires
self._velocity_mode_triggered_count: int = 0 # For metrics

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stateful in‑memory tracking makes velocity mode per‑process

The middleware keeps _request_log, _local_cache, and _velocity_mode_until in memory, so “global” velocity mode and fallback limits will diverge across workers. Consider moving these counters/state into Redis (or a dedicated service) and annotating the stored fields/redis client with concrete types.
As per coding guidelines: Middleware should be stateless, non-blocking, and focus on cross-cutting concerns (logging, authentication, rate limiting, tracing). Avoid business logic in middleware; delegate to services instead; and use type hints throughout all Python code (Function parameters, return types, class attributes) targeting Python 3.12+ syntax.

🤖 Prompt for AI Agents
In `@src/middleware/security_middleware.py` around lines 52 - 64, The middleware
currently keeps state in-memory in __init__ via _request_log, _local_cache,
_velocity_mode_until and so will diverge across workers; refactor to make the
middleware stateless by moving shared counters and velocity state into Redis
(use the redis_client parameter) or a dedicated service class (e.g.,
VelocityService) and delegate all business logic (sliding-window tracking,
fallback limits, velocity-mode toggling) to that service; add concrete type
hints for redis_client, _local_cache, _request_log, _velocity_mode_until, and
_velocity_mode_triggered_count using Python 3.12+ typing syntax and ensure
middleware only performs non-blocking calls to the service (no in-process state
mutations).

Comment on lines +183 to +191
try:
# Use Redis for distributed limiting
count = await self.redis.incr(full_key)
if count == 1:
await self.redis.expire(full_key, window * 2)
return count <= limit
except Exception as e:
logger.error(f"Redis security limit error: {e}")
# Fallback to local
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if auto_sentry.py exists and inspect its contents
if [ -f "src/utils/auto_sentry.py" ]; then
  echo "=== auto_sentry.py exists ==="
  wc -l src/utils/auto_sentry.py
  echo ""
  echo "=== Function signatures in auto_sentry.py ==="
  rg -n "^def |^async def" src/utils/auto_sentry.py
else
  echo "auto_sentry.py NOT found"
fi

Repository: Alpaca-Network/gatewayz-backend

Length of output: 786


🏁 Script executed:

# Inspect the security_middleware.py file around the specified lines
if [ -f "src/middleware/security_middleware.py" ]; then
  echo "=== Lines 175-200 from security_middleware.py ==="
  sed -n '175,200p' src/middleware/security_middleware.py
  echo ""
  echo "=== Search for Sentry imports in security_middleware.py ==="
  rg -n "sentry|Sentry" src/middleware/security_middleware.py
else
  echo "security_middleware.py NOT found"
fi

Repository: Alpaca-Network/gatewayz-backend

Length of output: 1228


🏁 Script executed:

# Inspect the auto_capture_errors decorator and _auto_capture_exception function
echo "=== auto_capture_errors decorator (lines 54-135) ==="
sed -n '54,135p' src/utils/auto_sentry.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3084


🏁 Script executed:

# Inspect _auto_capture_exception function
echo "=== _auto_capture_exception function ==="
sed -n '136,237p' src/utils/auto_sentry.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3803


🏁 Script executed:

# Search for how Sentry is used elsewhere in middleware and services
echo "=== Sentry usage patterns in middleware ==="
rg -n "sentry|capture" src/middleware/ -A 2 | head -40
echo ""
echo "=== Sentry usage patterns in services ==="
rg -n "capture_exception|_auto_capture_exception" src/services/ -A 2 | head -40

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3508


Add Sentry capture for Redis rate limiter failures

The Redis exception handler only logs the error. Add Sentry capture with the rate limit key and threshold as context so limiter outages are visible in monitoring. Use capture_cache_error() from src/utils/auto_sentry.py to track the cache operation, key, and exception:

Example:
except Exception as e:
    logger.error(f"Redis security limit error: {e}")
    from src.utils.auto_sentry import capture_cache_error
    capture_cache_error(
        e,
        operation="rate_limit_check",
        cache_type="redis",
        key=full_key,
        details={"limit": limit, "endpoint": key}
    )
🤖 Prompt for AI Agents
In `@src/middleware/security_middleware.py` around lines 183 - 191, In the Redis
exception handler inside the rate limiting logic (the try/except that uses
self.redis.incr(full_key) and returns count <= limit) add a Sentry capture by
importing capture_cache_error from src.utils.auto_sentry and calling it with the
caught exception, operation="rate_limit_check", cache_type="redis",
key=full_key, and details containing {"limit": limit, "endpoint": key}; keep the
existing logger.error call and then call capture_cache_error(e, operation=...,
cache_type=..., key=full_key, details=...) before falling back to local
limiting.

Comment on lines +193 to +199
# Local in-memory fallback
if full_key not in self._local_cache:
self._local_cache[full_key] = 0
# Periodically clean old keys
if now - self._last_cleanup > 300:
self._local_cache = {k: v for k, v in self._local_cache.items() if k.startswith(f"sec_rl:{key}:")}
self._last_cleanup = now
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Local fallback cleanup drops unrelated keys, weakening rate limits

The cleanup keeps only keys matching the current key prefix, which wipes counters for other IPs/fingerprints every cleanup cycle. That effectively resets their limits and weakens protection.

🐛 Suggested fix
-            if now - self._last_cleanup > 300:
-                self._local_cache = {k: v for k, v in self._local_cache.items() if k.startswith(f"sec_rl:{key}:")}
-                self._last_cleanup = now
+            if now - self._last_cleanup > 300:
+                current_bucket = bucket
+                self._local_cache = {
+                    k: v
+                    for k, v in self._local_cache.items()
+                    if k.rsplit(":", 1)[-1].isdigit()
+                    and int(k.rsplit(":", 1)[-1]) >= current_bucket - 1
+                }
+                self._last_cleanup = now
🤖 Prompt for AI Agents
In `@src/middleware/security_middleware.py` around lines 193 - 199, The cleanup
currently filters _local_cache to only keys with the current prefix
(full_key/key) which deletes counters for other clients; change the strategy to
evict stale entries by age instead: modify _local_cache storage (or add a
companion dict) so each entry records a last_seen timestamp (e.g., store (count,
last_seen) or maintain _local_cache_timestamps), update the increment logic in
the same code path to set last_seen = now when touching full_key, and replace
the cleanup block that references _last_cleanup with a loop that removes entries
whose last_seen < now - 300 (rather than keeping only keys starting with
f"sec_rl:{key}:"); keep using _last_cleanup to avoid running the sweep too
often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant