Audit API credit deductions with retries, metrics, and reconciliation#1026
Audit API credit deductions with retries, metrics, and reconciliation#1026
Conversation
…ductions - Introduce retry logic with exponential backoff for credit deduction failures to improve billing reliability. - Add Prometheus metrics to monitor credit deduction attempts, successes, failures, latencies, and retry counts. - Implement critical Sentry alerts for failed credit deductions after retries are exhausted. - Create a new database table and migration (credit_deduction_failures) to log failed streaming credit deduction attempts for manual reconciliation. - Implement a fallback background task wrapper to safely handle credit deduction failures for streaming requests without impacting user experience. - Enhance existing credit handling to support streaming requests with the new retry and fallback mechanisms. - Add detailed unit tests covering retry behavior, metrics recording, error handling, and fallback logic. This improves billing accuracy and operational monitoring, reducing risk of lost revenue due to transient failures. Co-authored-by: gatewayz-ai-inbox[bot] <gatewayz-ai-inbox[bot]@users.noreply.github.com>
…fails Move record_usage and update_rate_limit_usage calls outside the retry loop to prevent duplicate deductions if these secondary operations fail after deduct_credits succeeds. The fix ensures: 1. deduct_credits is marked successful immediately after it succeeds 2. record_usage and update_rate_limit_usage are called AFTER the retry loop 3. Failures in secondary operations are logged but don't trigger retry 4. Added tests to verify no duplicate deductions occur 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ck paths - Fix import of track_trial_usage -> track_trial_usage_for_key in credit_handler.py - Update function call to match the correct signature (no model_id, prompt_tokens, completion_tokens kwargs) - Fix test mock paths to patch functions at their source modules instead of credit_handler module: - src.services.prometheus_metrics.* for Prometheus metrics - src.db.users.* for user database functions - src.db.rate_limits.* for rate limit functions - src.db.trials.* for trial functions - src.services.pricing.* for pricing functions - sentry_sdk.capture_message for Sentry SDK - All 21 credit_handler tests now pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hat adapters - Remove unused `time`, `List`, `Union`, and `InternalUsage` imports - Replace `Dict` with `dict` for type annotations (UP006) - Apply fixes to ai_sdk.py, anthropic.py, base.py, and openai.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd secure RLS - Add pre-create cleanup to drop existing policies/triggers for idempotency - Restrict RLS policies to service_role for INSERT/UPDATE operations - Allow authenticated admin users to SELECT for reconciliation - Add documented DOWN migration section for rollback - Add security comments explaining access restrictions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wrap DROP POLICY/TRIGGER statements in DO block to check if table exists first, preventing errors when migration runs on fresh database. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
auth.uid() returns UUID, so we need to compare against users.auth_id column instead of users.id (which is BIGINT). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ng table Credit deduction failures is a sensitive billing table that should only be accessed via the API backend (using service_role). Admin access for reconciliation is handled through API endpoints, not direct database access. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add is_streaming=False explicitly to non-streaming endpoints for clarity - Change log level from DEBUG to WARNING for reconciliation table errors - Replace redundant sentry_sdk.capture_message with breadcrumb for billing alerts - Update tests to reflect new breadcrumb-based alerting approach - Apply code formatting via ruff 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
📝 WalkthroughWalkthroughThis PR modernizes type hints to Python 3.9+ style by replacing Changes
Sequence Diagram(s)sequenceDiagram
participant Stream as Stream Handler
participant Route as Chat Route
participant Credit as Credit Handler
participant Retry as Retry Loop
participant Sentry as Sentry
participant DB as Supabase
participant Prometheus as Prometheus
Stream->>Route: stream completion event
Route->>Credit: await handle_credits_and_usage_with_fallback()
Credit->>Retry: attempt deduction (attempt 1)
alt Deduction Success
Retry-->>Credit: success
Credit->>Prometheus: record success metric
Credit-->>Route: (cost, True)
else Transient Failure
Retry->>Retry: exponential backoff
Retry->>Credit: retry (attempt 2..N)
alt After Max Retries
Retry-->>Credit: failure
Credit->>Sentry: send_critical_billing_alert()
Credit->>DB: log_failed_deduction_for_reconciliation()
Credit->>Prometheus: record missed deduction
Credit-->>Route: (fallback_cost, False)
Route->>Route: log reconciliation warning
else Succeeds on Retry
Retry-->>Credit: success
Credit->>Prometheus: record success + retry metrics
Credit-->>Route: (cost, True)
end
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Schema Sync Check Results ✅All schema validation checks passed! Schema Summary
Checks Performed
This check runs automatically on PRs that modify database migrations. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae685cf885
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Record success metrics | ||
| latency = time.monotonic() - start_time | ||
| status = "success" if attempt == 1 else "retried" | ||
| _record_credit_metrics(status, cost, endpoint, is_streaming, latency, attempt) |
There was a problem hiding this comment.
Count successful retries as success in metrics
When a deduction succeeds after one or more retries, the status is set to retried and passed into _record_credit_metrics, so the success counters never increment for those requests. That means success rates and amounts derived from credit_deduction_total/credit_deduction_amount_usd will undercount real successes whenever the first attempt failed, which can mislead alerting and dashboards that track billing reliability. Consider marking any successful deduction as success and rely on attempt_number/retry metrics to capture retry behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/services/credit_handler.py (1)
244-252:⚠️ Potential issue | 🟠 MajorPotential double-count of trial usage.
handle_credits_and_usagenow records trial usage viatrack_trial_usage_for_key, but the chat routes still track trial usage after non-streaming responses. This can consume trial quotas twice for the same request. Consider centralizing trial tracking in one place (or adding a flag to skip when already recorded).
🤖 Fix all issues with AI agents
In `@src/routes/chat.py`:
- Around line 2618-2631: The block that manually increments
model_inference_requests, observes model_inference_duration, and increments
tokens_used inside the unified handler duplicates metrics already emitted by
_record_inference_metrics_and_health(); remove this manual recording block from
the unified handler (the calls to model_inference_requests.labels(...).inc(),
model_inference_duration.labels(...).observe(...), and
tokens_used.labels(...).inc(...)) so only _record_inference_metrics_and_health()
records these inference metrics, or alternatively move them to a distinct metric
namespace if you truly need separate "handler-only" metrics.
In `@src/services/credit_handler.py`:
- Around line 515-555: Extract the Supabase insertion out of
_log_failed_deduction_for_reconciliation into a new sync DB helper (e.g.,
src/db/reconciliation.py → log_failed_deduction_for_reconciliation or
src/db/credit_transactions.py → log_failed_deduction) that mirrors the pattern
in log_credit_transaction: build the record dict, obtain get_supabase_client()
inside the DB helper (so client creation/I/O happens in the sync context),
implement the insert as a do_insert(client) callable and call
execute_with_retry(do_insert, ...). Then update
_log_failed_deduction_for_reconciliation to only call that DB helper via await
asyncio.to_thread(db.log_failed_deduction_for_reconciliation, ...) and remove
any direct get_supabase_client() / client.table(...).execute() usage from the
service.
🧹 Nitpick comments (5)
tests/services/test_credit_handler.py (2)
12-12: Unused import:asyncio.The
asynciomodule is imported but not directly used. Thepytest.mark.asynciodecorator handles async test execution without requiring explicitasyncioimports.🧹 Proposed fix
-import asyncio from unittest.mock import AsyncMock, MagicMock, patch, call
199-218: Docstring is inconsistent with test behavior.The docstring says "Test that Sentry breadcrumb and alert are sent for all costs" but the method name is
test_skips_message_for_small_cost, implying something should be skipped. The test actually verifies that bothcapture_payment_errorandadd_breadcrumbare called regardless of cost, which contradicts the method name.Consider renaming the test method to match the actual behavior being tested.
📝 Proposed fix
- def test_skips_message_for_small_cost(self, mock_capture, mock_add_breadcrumb): - """Test that Sentry breadcrumb and alert are sent for all costs.""" + def test_sends_alert_for_small_cost(self, mock_capture, mock_add_breadcrumb): + """Test that Sentry alert and breadcrumb are sent even for small costs."""supabase/migrations/20260202000001_create_credit_deduction_failures_table.sql (1)
136-150: Down migration is commented out rather than executable.Per coding guidelines, migrations should "include both up and down migrations." The down migration is provided as comments for manual execution, which is a common pattern but doesn't allow automated rollback.
Consider whether automated rollback capability is needed for this table. If the table may contain critical audit data, manual rollback (as currently designed) may actually be preferable to prevent accidental data loss.
src/routes/chat.py (1)
913-947: Add parameter type hints to_handle_credits_and_usage_with_fallback.The new helper omits parameter annotations, which breaks the repo-wide typing requirement and makes static checks less effective.
As per coding guidelines: Use type hints throughout all Python code (Function parameters, return types, class attributes) targeting Python 3.12+ syntax.💡 Suggested fix
-async def _handle_credits_and_usage_with_fallback( - api_key, - user, - model, - trial, - total_tokens, - prompt_tokens, - completion_tokens, - elapsed_ms, -) -> tuple[float, bool]: +async def _handle_credits_and_usage_with_fallback( + api_key: str, + user: dict, + model: str, + trial: dict, + total_tokens: int, + prompt_tokens: int, + completion_tokens: int, + elapsed_ms: int, +) -> tuple[float, bool]:src/services/credit_handler.py (1)
165-199: Add OpenTelemetry spans around credit deduction retries.This is a critical business-logic path; adding spans (e.g., around the retry loop and reconciliation logging) would align with tracing requirements and help correlate billing failures with request traces.
As per coding guidelines: Use OpenTelemetry tracing from src/config/opentelemetry_config.py in all provider client calls and critical business logic paths to enable distributed tracing via Tempo integration.
| model_inference_requests.labels( | ||
| provider=provider, model=model, status="success" | ||
| ).inc() | ||
| model_inference_duration.labels(provider=provider, model=model).observe( | ||
| inference_duration | ||
| ) | ||
| if input_tokens > 0: | ||
| tokens_used.labels(provider=provider, model=model, token_type="input").inc(input_tokens) | ||
| tokens_used.labels(provider=provider, model=model, token_type="input").inc( | ||
| input_tokens | ||
| ) | ||
| if output_tokens > 0: | ||
| tokens_used.labels(provider=provider, model=model, token_type="output").inc(output_tokens) | ||
| tokens_used.labels(provider=provider, model=model, token_type="output").inc( | ||
| output_tokens | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's explore the file structure
find . -type f -name "*.py" | grep -E "(routes|chat)" | head -20Repository: Alpaca-Network/gatewayz-backend
Length of output: 669
🏁 Script executed:
# Check file size to determine how to read it
wc -l src/routes/chat.pyRepository: Alpaca-Network/gatewayz-backend
Length of output: 98
🏁 Script executed:
# Read the context around lines 2618-2631
sed -n '2600,2650p' src/routes/chat.py | cat -nRepository: Alpaca-Network/gatewayz-backend
Length of output: 2967
🏁 Script executed:
# Search for _record_inference_metrics_and_health function
rg "_record_inference_metrics_and_health" -A 20 src/routes/chat.pyRepository: Alpaca-Network/gatewayz-backend
Length of output: 3943
🏁 Script executed:
# Also search for where this function is called to understand flow
rg "_record_inference_metrics_and_health\(" src/routes/chat.pyRepository: Alpaca-Network/gatewayz-backend
Length of output: 342
🏁 Script executed:
# Get more complete view of _record_inference_metrics_and_health
rg "_record_inference_metrics_and_health" -B 5 -A 30 src/routes/chat.py | head -80Repository: Alpaca-Network/gatewayz-backend
Length of output: 3032
🏁 Script executed:
# Get larger context around lines 2618-2631 to see execution flow
sed -n '2550,2700p' src/routes/chat.py | cat -nRepository: Alpaca-Network/gatewayz-backend
Length of output: 9086
🏁 Script executed:
# Search for all places where these specific metrics are recorded
rg "model_inference_requests|model_inference_duration|tokens_used" src/routes/chat.py | head -30Repository: Alpaca-Network/gatewayz-backend
Length of output: 1260
🏁 Script executed:
# Get context AFTER the unified handler to see what happens next
sed -n '2700,2850p' src/routes/chat.py | cat -nRepository: Alpaca-Network/gatewayz-backend
Length of output: 8861
🏁 Script executed:
# Search for the full endpoint handler to understand the complete flow
rg "async def chat_completions" -A 300 src/routes/chat.py | head -350Repository: Alpaca-Network/gatewayz-backend
Length of output: 13752
🏁 Script executed:
# Look for where _record_inference_metrics_and_health is called after the unified handler
sed -n '2800,3000p' src/routes/chat.py | cat -nRepository: Alpaca-Network/gatewayz-backend
Length of output: 10330
🏁 Script executed:
# Search specifically for calls to _record_inference_metrics_and_health in the main flow (not in streaming)
rg "await _record_inference_metrics_and_health" -B 10 -A 2 src/routes/chat.py | head -150Repository: Alpaca-Network/gatewayz-backend
Length of output: 2281
Remove duplicate metric recording from unified handler.
The manual recording at lines 2618-2631 (model_inference_requests, model_inference_duration, tokens_used) duplicates metrics already recorded by _record_inference_metrics_and_health() called at line ~2800. For unified handler requests, this double-counts all inference metrics. Remove the manual recording block or move it to a distinct metric namespace if separate "handler-only" tracking is needed.
Suggested fix
- inference_duration = time.time() - inference_start
- model_inference_requests.labels(
- provider=provider, model=model, status="success"
- ).inc()
- model_inference_duration.labels(provider=provider, model=model).observe(
- inference_duration
- )
- if input_tokens > 0:
- tokens_used.labels(provider=provider, model=model, token_type="input").inc(
- input_tokens
- )
- if output_tokens > 0:
- tokens_used.labels(provider=provider, model=model, token_type="output").inc(
- output_tokens
- )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model_inference_requests.labels( | |
| provider=provider, model=model, status="success" | |
| ).inc() | |
| model_inference_duration.labels(provider=provider, model=model).observe( | |
| inference_duration | |
| ) | |
| if input_tokens > 0: | |
| tokens_used.labels(provider=provider, model=model, token_type="input").inc(input_tokens) | |
| tokens_used.labels(provider=provider, model=model, token_type="input").inc( | |
| input_tokens | |
| ) | |
| if output_tokens > 0: | |
| tokens_used.labels(provider=provider, model=model, token_type="output").inc(output_tokens) | |
| tokens_used.labels(provider=provider, model=model, token_type="output").inc( | |
| output_tokens | |
| ) |
🤖 Prompt for AI Agents
In `@src/routes/chat.py` around lines 2618 - 2631, The block that manually
increments model_inference_requests, observes model_inference_duration, and
increments tokens_used inside the unified handler duplicates metrics already
emitted by _record_inference_metrics_and_health(); remove this manual recording
block from the unified handler (the calls to
model_inference_requests.labels(...).inc(),
model_inference_duration.labels(...).observe(...), and
tokens_used.labels(...).inc(...)) so only _record_inference_metrics_and_health()
records these inference metrics, or alternatively move them to a distinct metric
namespace if you truly need separate "handler-only" metrics.
| async def _log_failed_deduction_for_reconciliation( | ||
| user_id: int | None, | ||
| api_key: str, | ||
| model: str, | ||
| cost: float, | ||
| total_tokens: int, | ||
| prompt_tokens: int, | ||
| completion_tokens: int, | ||
| endpoint: str, | ||
| error: str, | ||
| ) -> None: | ||
| """ | ||
| Log failed credit deductions to database for later reconciliation. | ||
|
|
||
| This creates an audit trail of missed deductions that can be processed | ||
| manually or by a background job. | ||
| """ | ||
| try: | ||
| from src.config.supabase_config import get_supabase_client | ||
|
|
||
| client = get_supabase_client() | ||
|
|
||
| # Try to insert into a reconciliation table | ||
| # If the table doesn't exist, this will fail silently | ||
| await asyncio.to_thread( | ||
| lambda: client.table("credit_deduction_failures") | ||
| .insert( | ||
| { | ||
| "user_id": user_id, | ||
| "api_key_prefix": api_key[:10] + "..." if api_key else None, | ||
| "model": model, | ||
| "tokens": total_tokens, | ||
| "cost_usd": cost, | ||
| "api_key": api_key[:10] + "..." if api_key else None, | ||
| "total_tokens": total_tokens, | ||
| "prompt_tokens": prompt_tokens, | ||
| "completion_tokens": completion_tokens, | ||
| "endpoint": endpoint, | ||
| }, | ||
| "error_message": error[:1000], | ||
| "status": "pending", | ||
| } | ||
| ) | ||
| raise # Re-raise to ensure billing errors are not silently ignored | ||
| .execute() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and the specific lines in question
find . -type f -name "*.py" | grep -E "(credit_handler|supabase_config|db)" | head -20Repository: Alpaca-Network/gatewayz-backend
Length of output: 659
🏁 Script executed:
# Read the specific section of credit_handler.py
if [ -f "src/services/credit_handler.py" ]; then
wc -l src/services/credit_handler.py
sed -n '515,555p' src/services/credit_handler.py
fiRepository: Alpaca-Network/gatewayz-backend
Length of output: 1469
🏁 Script executed:
# Check the implementation of get_supabase_client
if [ -f "src/config/supabase_config.py" ]; then
cat src/config/supabase_config.py
fiRepository: Alpaca-Network/gatewayz-backend
Length of output: 18229
🏁 Script executed:
# Explore the database module structure
find src/db -type f -name "*.py" 2>/dev/null | head -20Repository: Alpaca-Network/gatewayz-backend
Length of output: 541
🏁 Script executed:
# Look for async patterns with to_thread usage in the codebase
rg "asyncio\.to_thread" --type py -B 3 -A 3 | head -50Repository: Alpaca-Network/gatewayz-backend
Length of output: 4004
🏁 Script executed:
# Check credit_transactions.py to see if reconciliation should be there
if [ -f "src/db/credit_transactions.py" ]; then
head -100 src/db/credit_transactions.py
fiRepository: Alpaca-Network/gatewayz-backend
Length of output: 3648
🏁 Script executed:
# Search for reconciliation or failure logging patterns in db modules
rg "reconcil|failure|failed" --type py src/db/ -i -A 3Repository: Alpaca-Network/gatewayz-backend
Length of output: 50387
🏁 Script executed:
# Look at how other db modules handle inserts with proper async patterns
rg "async def.*insert" --type py src/db/ -A 10 | head -50Repository: Alpaca-Network/gatewayz-backend
Length of output: 57
🏁 Script executed:
# Check if any db module already has logging/audit functionality
ls -la src/db/ | grep -E "(log|audit|reconcil)"Repository: Alpaca-Network/gatewayz-backend
Length of output: 146
Move credit deduction failure logging to a database module.
The blocking I/O concern is valid: get_supabase_client() can perform I/O on cold start (connection testing executes a database query), and calling it directly before asyncio.to_thread() blocks the event loop. However, the core issue is architectural: this function should not directly access the Supabase client or insert records from a service layer.
Per the codebase guidelines, src/services/ modules must be stateless and delegate all database operations to src/db/ modules. Create a helper function in src/db/credit_transactions.py (or a new src/db/reconciliation.py) to log failed deductions, following the pattern used in credit_transactions.py which leverages execute_with_retry() for resilience. Call this db module function from the service.
Example pattern from credit_transactions.py
def log_credit_transaction(...) -> dict[str, Any] | None:
"""Log a credit transaction to the audit trail"""
transaction_data = {...}
def do_insert(client):
"""Execute the insert operation with the given client."""
result = client.table("credit_transactions").insert(transaction_data).execute()
return result
try:
result = execute_with_retry(
do_insert,
max_retries=2,
retry_delay=0.5,
operation_name=f"log_credit_transaction(...)",
)🤖 Prompt for AI Agents
In `@src/services/credit_handler.py` around lines 515 - 555, Extract the Supabase
insertion out of _log_failed_deduction_for_reconciliation into a new sync DB
helper (e.g., src/db/reconciliation.py → log_failed_deduction_for_reconciliation
or src/db/credit_transactions.py → log_failed_deduction) that mirrors the
pattern in log_credit_transaction: build the record dict, obtain
get_supabase_client() inside the DB helper (so client creation/I/O happens in
the sync context), implement the insert as a do_insert(client) callable and call
execute_with_retry(do_insert, ...). Then update
_log_failed_deduction_for_reconciliation to only call that DB helper via await
asyncio.to_thread(db.log_failed_deduction_for_reconciliation, ...) and remove
any direct get_supabase_client() / client.table(...).execute() usage from the
service.
Summary
Audits and hardens the credit deduction flow across streaming and non-streaming chat completions by centralizing handling, adding retry logic, telemetry, and reconciliation for missed deductions. Also introduces a dedicated reconciliation table and metrics to improve auditing and reliability.
Changes
Core credit handling
Streaming support and fallback
Metrics and alerts
Reconciliation table and data auditing
Tests
Migration
Validation plan
Migration notes
Risk and mitigation
Documentation
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://terragon-www-production.up.railway.app/task/05f1ac47-a16a-417c-a098-79a473ccd909
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Greptile Overview
Greptile Summary
This PR introduces a robust credit deduction auditing system that significantly improves billing reliability and observability. The implementation centralizes credit handling logic with retry mechanisms, comprehensive metrics, and reconciliation tracking for failed deductions.
Key improvements:
credit_handler.pywith exponential backoff retry logic (3 attempts: 0.5s, 1s, 2s delays)handle_credits_and_usagefor synchronous paths andhandle_credits_and_usage_with_fallbackfor streaming requestscredit_deduction_failurestable for reconciliationis_trialflags when users have active subscriptionsTest coverage:
638 lines of comprehensive tests covering retry logic, metrics recording, duplicate deduction prevention, trial vs paid handling, and all failure scenarios.
Architecture:
The streaming path uses a non-blocking fallback handler that ensures responses complete successfully even if billing temporarily fails, with all failures tracked for manual reconciliation. This prevents both lost revenue and degraded user experience.
Confidence Score: 4/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant ChatRoute as /v1/chat/completions participant CreditHandler as credit_handler.py participant Database as Supabase participant Metrics as Prometheus participant Sentry as Sentry Alerts participant ReconTable as credit_deduction_failures alt Non-Streaming Request Client->>ChatRoute: POST /v1/chat/completions (stream=false) ChatRoute->>ChatRoute: Process inference ChatRoute->>CreditHandler: handle_credits_and_usage(is_streaming=False) alt Trial User CreditHandler->>Database: log_api_usage_transaction($0) CreditHandler->>Metrics: Record success metrics CreditHandler-->>ChatRoute: cost else Paid User loop Max 3 Retries (exponential backoff: 0.5s, 1s, 2s) CreditHandler->>Database: deduct_credits(cost) alt Success CreditHandler->>Metrics: Record success/retry metrics CreditHandler->>Database: record_usage() CreditHandler->>Database: update_rate_limit_usage() CreditHandler-->>ChatRoute: cost else Validation Error (Insufficient Credits) Database-->>CreditHandler: ValueError CreditHandler->>Metrics: Record failed metrics CreditHandler->>Sentry: capture_payment_error CreditHandler-->>ChatRoute: raise ValueError else Transient Error Database-->>CreditHandler: Exception CreditHandler->>Metrics: Record retry metrics CreditHandler->>CreditHandler: sleep(exponential backoff) end end alt All Retries Exhausted CreditHandler->>Metrics: Record failed + missed_deduction CreditHandler->>Sentry: _send_critical_billing_alert CreditHandler-->>ChatRoute: raise RuntimeError end end ChatRoute-->>Client: JSON Response else Streaming Request Client->>ChatRoute: POST /v1/chat/completions (stream=true) ChatRoute->>ChatRoute: Start streaming response ChatRoute-->>Client: SSE Stream (starts immediately) Note over ChatRoute,Client: Response streaming in progress... ChatRoute->>ChatRoute: Background Task: _process_stream_completion_background ChatRoute->>CreditHandler: handle_credits_and_usage_with_fallback(is_streaming=True) alt Credit Deduction Succeeds CreditHandler->>CreditHandler: handle_credits_and_usage (with retries) CreditHandler->>Database: deduct_credits (with retry logic) CreditHandler->>Metrics: Record success metrics CreditHandler-->>ChatRoute: (cost, True) else Credit Deduction Fails After Retries CreditHandler->>CreditHandler: Calculate cost (even if deduction failed) CreditHandler->>Metrics: Record background_task_failure CreditHandler->>Metrics: Record missed_deduction CreditHandler->>ReconTable: INSERT failed deduction record Note over CreditHandler,ReconTable: status='pending' for manual reconciliation CreditHandler->>Sentry: Log critical billing alert CreditHandler-->>ChatRoute: (cost, False) ChatRoute->>ChatRoute: Log warning about failed deduction Note over ChatRoute: Request completes successfully<br/>User gets response<br/>Billing reconciliation needed end Note over Client,ChatRoute: Stream completes normally end