Skip to content

Audit API credit deductions with retries, metrics, and reconciliation#1026

Open
vdimarco wants to merge 9 commits intomainfrom
gatewayz-code/audit-api-credit-deductions-ydd619
Open

Audit API credit deductions with retries, metrics, and reconciliation#1026
vdimarco wants to merge 9 commits intomainfrom
gatewayz-code/audit-api-credit-deductions-ydd619

Conversation

@vdimarco
Copy link
Contributor

@vdimarco vdimarco commented Feb 2, 2026

Summary

Audits and hardens the credit deduction flow across streaming and non-streaming chat completions by centralizing handling, adding retry logic, telemetry, and reconciliation for missed deductions. Also introduces a dedicated reconciliation table and metrics to improve auditing and reliability.

Changes

Core credit handling

  • Centralized credit deduction flow in handle_credits_and_usage for all chat endpoints with retry logic and detailed telemetry.
  • Added is_streaming flag to influence retry behavior and metrics.
  • Implemented exponential backoff retries for paid-user credit deductions and structured error handling.
  • Ensured non-blocking behavior for streaming paths; critical billing errors are logged for reconciliation without crashing the request path.

Streaming support and fallback

  • Added handle_credits_and_usage_with_fallback for streaming/background tasks.
    • Tries credit deduction with full retry logic.
    • On failure, returns (cost, False) for reconciliation and logs for manual processing.
    • Attempts to log failed deductions for reconciliation via a dedicated audit path.

Metrics and alerts

  • Added Prometheus metrics to monitor billing reliability:
    • gatewayz_credit_deduction_total (status, endpoint, is_streaming)
    • gatewayz_credit_deduction_amount_usd (cost by status)
    • gatewayz_credit_deduction_latency (latency by endpoint and streaming)
    • gatewayz_credit_deduction_retry_total (retry attempts)
    • gatewayz_streaming_background_task_failures (failure type, endpoint)
    • gatewayz_missed_credit_deductions_usd (missed deductions by reason)
  • Introduced _send_critical_billing_alert to emit Sentry alerts on persistent credit deduction failures with rich context.

Reconciliation table and data auditing

  • Added a new PostgreSQL migration to create credit_deduction_failures table for audit/reconciliation of streaming deduction failures.
  • Table supports tracking user, model, cost, tokens, endpoint, error messages, and status (pending/resolved/written_off).
  • Included indexes and row-level security policies to limit access to service/admin roles.
  • Included a trigger to maintain updated_at timestamps.

Tests

  • Added tests/services/test_credit_handler.py covering:
    • Metrics recording for success and retries
    • Recording of missed deductions
    • Background task failure metrics and alerting
    • Sentry alerting behavior for critical billing failures
    • Core credit handling for trial vs paid users, including retry logic, and fallback paths
    • Streaming fallback behavior and reconciliation logging
    • Cost estimation fallback when pricing lookup fails
    • Retry configuration sanity checks
  • Tests validate that there are no duplicate deductions if usage logging fails and that retries stop after max attempts

Migration

  • Supabase SQL migration file: supabase/migrations/20260202000001_create_credit_deduction_failures_table.sql
    • Creates credit_deduction_failures with appropriate fields, constraints, indexes, and policies
    • Includes an idempotent setup for policies and triggers

Validation plan

  • Run unit tests: pytest tests/services/test_credit_handler.py
  • Validate streaming path: ensure _handle_credits_and_usage_with_fallback returns (cost, True) on success and (cost, False) with reconciliation logging on failure
  • Validate retry behavior: ensure exponential backoff delays are applied and max retries are respected
  • Verify metrics are emitted correctly under various success/failure scenarios
  • Verify reconciliation table is created and accessible with proper RBAC policies

Migration notes

  • Apply the new credit_deduction_failures table migration to enable audit/reconciliation workflows.
  • Ensure database permissions align with service_role for inserts/selects/updates as documented in migration script.

Risk and mitigation

  • Risk: Larger surface area for billing logic may affect latency in edge cases. Mitigation: streaming path uses a non-blocking fallback and logs for reconciliation without crashing deliveries.
  • Risk: Reconciliation table access control is strict; ensure service/admin roles are configured correctly.

Documentation

  • Internal billing and reconciliation workflow updated to reflect new background task auditing and metrics.
  • Sentry alerting context includes model, cost, endpoint, and attempt details to aid debugging.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://terragon-www-production.up.railway.app/task/05f1ac47-a16a-417c-a098-79a473ccd909

Summary by CodeRabbit

  • New Features

    • Added automatic retry mechanism for credit deductions with graceful fallback handling during streaming requests.
    • Implemented reconciliation tracking for failed billing operations to ensure accurate accounting.
  • Bug Fixes

    • Improved stability of streaming chat requests by preventing billing failures from interrupting service.
  • Chores

    • Updated Python type annotations to modern standards for improved code consistency.
    • Enhanced monitoring and observability for billing operations.

Greptile Overview

Greptile Summary

This PR introduces a robust credit deduction auditing system that significantly improves billing reliability and observability. The implementation centralizes credit handling logic with retry mechanisms, comprehensive metrics, and reconciliation tracking for failed deductions.

Key improvements:

  • Centralized credit deduction in credit_handler.py with exponential backoff retry logic (3 attempts: 0.5s, 1s, 2s delays)
  • Dual handlers: handle_credits_and_usage for synchronous paths and handle_credits_and_usage_with_fallback for streaming requests
  • Critical billing protection: deduction marked successful immediately after database commit, preventing duplicate charges if subsequent operations fail
  • Streaming-safe design: failed deductions don't crash user-facing streams; instead they're logged to credit_deduction_failures table for reconciliation
  • Comprehensive Prometheus metrics: tracks success/failure rates, retry attempts, latency, and missed deductions by endpoint and streaming status
  • Sentry integration: critical billing alerts sent when retries exhausted, with rich context (model, cost, user_id, attempt counts)
  • Trial override logic: protects against stale is_trial flags when users have active subscriptions
  • New reconciliation table with RLS policies (service_role only), proper indexes, and idempotent migration

Test coverage:
638 lines of comprehensive tests covering retry logic, metrics recording, duplicate deduction prevention, trial vs paid handling, and all failure scenarios.

Architecture:
The streaming path uses a non-blocking fallback handler that ensures responses complete successfully even if billing temporarily fails, with all failures tracked for manual reconciliation. This prevents both lost revenue and degraded user experience.

Confidence Score: 4/5

  • Safe to merge with excellent test coverage and thoughtful error handling; no critical issues found
  • Score reflects comprehensive testing (638 lines), well-designed retry logic with exponential backoff, proper critical section protection preventing duplicate charges, and thorough observability. The streaming fallback handler gracefully degrades on billing failures while ensuring reconciliation. Minor deduction due to complexity increase and need to verify reconciliation workflow is established.
  • No files require special attention - all changes are well-tested and follow established patterns

Important Files Changed

Filename Overview
src/services/credit_handler.py Centralized credit deduction logic with retry mechanism, exponential backoff, Sentry alerts, and Prometheus metrics; includes fallback handler for streaming requests with reconciliation logging
tests/services/test_credit_handler.py Comprehensive test coverage for credit handler including retry logic, metrics recording, Sentry alerts, trial vs paid handling, and duplicate deduction prevention
supabase/migrations/20260202000001_create_credit_deduction_failures_table.sql Creates reconciliation table for failed streaming credit deductions with proper RLS policies (service_role only), indexes for efficient queries, and idempotent migration structure
src/routes/chat.py Integrates centralized credit handler for both streaming and non-streaming requests; adds is_streaming flag to credit deduction calls; uses fallback handler for streaming background tasks
src/services/prometheus_metrics.py Adds comprehensive billing reliability metrics: credit deduction attempts, USD amounts, retry counts, latency histograms, background task failures, and missed deductions

Sequence Diagram

sequenceDiagram
    participant Client
    participant ChatRoute as /v1/chat/completions
    participant CreditHandler as credit_handler.py
    participant Database as Supabase
    participant Metrics as Prometheus
    participant Sentry as Sentry Alerts
    participant ReconTable as credit_deduction_failures

    alt Non-Streaming Request
        Client->>ChatRoute: POST /v1/chat/completions (stream=false)
        ChatRoute->>ChatRoute: Process inference
        ChatRoute->>CreditHandler: handle_credits_and_usage(is_streaming=False)
        
        alt Trial User
            CreditHandler->>Database: log_api_usage_transaction($0)
            CreditHandler->>Metrics: Record success metrics
            CreditHandler-->>ChatRoute: cost
        else Paid User
            loop Max 3 Retries (exponential backoff: 0.5s, 1s, 2s)
                CreditHandler->>Database: deduct_credits(cost)
                alt Success
                    CreditHandler->>Metrics: Record success/retry metrics
                    CreditHandler->>Database: record_usage()
                    CreditHandler->>Database: update_rate_limit_usage()
                    CreditHandler-->>ChatRoute: cost
                else Validation Error (Insufficient Credits)
                    Database-->>CreditHandler: ValueError
                    CreditHandler->>Metrics: Record failed metrics
                    CreditHandler->>Sentry: capture_payment_error
                    CreditHandler-->>ChatRoute: raise ValueError
                else Transient Error
                    Database-->>CreditHandler: Exception
                    CreditHandler->>Metrics: Record retry metrics
                    CreditHandler->>CreditHandler: sleep(exponential backoff)
                end
            end
            
            alt All Retries Exhausted
                CreditHandler->>Metrics: Record failed + missed_deduction
                CreditHandler->>Sentry: _send_critical_billing_alert
                CreditHandler-->>ChatRoute: raise RuntimeError
            end
        end
        
        ChatRoute-->>Client: JSON Response
    else Streaming Request
        Client->>ChatRoute: POST /v1/chat/completions (stream=true)
        ChatRoute->>ChatRoute: Start streaming response
        ChatRoute-->>Client: SSE Stream (starts immediately)
        
        Note over ChatRoute,Client: Response streaming in progress...
        
        ChatRoute->>ChatRoute: Background Task: _process_stream_completion_background
        ChatRoute->>CreditHandler: handle_credits_and_usage_with_fallback(is_streaming=True)
        
        alt Credit Deduction Succeeds
            CreditHandler->>CreditHandler: handle_credits_and_usage (with retries)
            CreditHandler->>Database: deduct_credits (with retry logic)
            CreditHandler->>Metrics: Record success metrics
            CreditHandler-->>ChatRoute: (cost, True)
        else Credit Deduction Fails After Retries
            CreditHandler->>CreditHandler: Calculate cost (even if deduction failed)
            CreditHandler->>Metrics: Record background_task_failure
            CreditHandler->>Metrics: Record missed_deduction
            CreditHandler->>ReconTable: INSERT failed deduction record
            Note over CreditHandler,ReconTable: status='pending' for manual reconciliation
            CreditHandler->>Sentry: Log critical billing alert
            CreditHandler-->>ChatRoute: (cost, False)
            
            ChatRoute->>ChatRoute: Log warning about failed deduction
            Note over ChatRoute: Request completes successfully<br/>User gets response<br/>Billing reconciliation needed
        end
        
        Note over Client,ChatRoute: Stream completes normally
    end
Loading

vdimarco and others added 9 commits February 2, 2026 19:51
…ductions

- Introduce retry logic with exponential backoff for credit deduction failures to improve billing reliability.
- Add Prometheus metrics to monitor credit deduction attempts, successes, failures, latencies, and retry counts.
- Implement critical Sentry alerts for failed credit deductions after retries are exhausted.
- Create a new database table and migration (credit_deduction_failures) to log failed streaming credit deduction attempts for manual reconciliation.
- Implement a fallback background task wrapper to safely handle credit deduction failures for streaming requests without impacting user experience.
- Enhance existing credit handling to support streaming requests with the new retry and fallback mechanisms.
- Add detailed unit tests covering retry behavior, metrics recording, error handling, and fallback logic.

This improves billing accuracy and operational monitoring, reducing risk of lost revenue due to transient failures.

Co-authored-by: gatewayz-ai-inbox[bot] <gatewayz-ai-inbox[bot]@users.noreply.github.com>
…fails

Move record_usage and update_rate_limit_usage calls outside the retry loop
to prevent duplicate deductions if these secondary operations fail after
deduct_credits succeeds.

The fix ensures:
1. deduct_credits is marked successful immediately after it succeeds
2. record_usage and update_rate_limit_usage are called AFTER the retry loop
3. Failures in secondary operations are logged but don't trigger retry
4. Added tests to verify no duplicate deductions occur

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ck paths

- Fix import of track_trial_usage -> track_trial_usage_for_key in credit_handler.py
- Update function call to match the correct signature (no model_id, prompt_tokens,
  completion_tokens kwargs)
- Fix test mock paths to patch functions at their source modules instead of
  credit_handler module:
  - src.services.prometheus_metrics.* for Prometheus metrics
  - src.db.users.* for user database functions
  - src.db.rate_limits.* for rate limit functions
  - src.db.trials.* for trial functions
  - src.services.pricing.* for pricing functions
  - sentry_sdk.capture_message for Sentry SDK
- All 21 credit_handler tests now pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…hat adapters

- Remove unused `time`, `List`, `Union`, and `InternalUsage` imports
- Replace `Dict` with `dict` for type annotations (UP006)
- Apply fixes to ai_sdk.py, anthropic.py, base.py, and openai.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd secure RLS

- Add pre-create cleanup to drop existing policies/triggers for idempotency
- Restrict RLS policies to service_role for INSERT/UPDATE operations
- Allow authenticated admin users to SELECT for reconciliation
- Add documented DOWN migration section for rollback
- Add security comments explaining access restrictions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wrap DROP POLICY/TRIGGER statements in DO block to check if table exists
first, preventing errors when migration runs on fresh database.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
auth.uid() returns UUID, so we need to compare against users.auth_id column
instead of users.id (which is BIGINT).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ng table

Credit deduction failures is a sensitive billing table that should only be
accessed via the API backend (using service_role). Admin access for
reconciliation is handled through API endpoints, not direct database access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add is_streaming=False explicitly to non-streaming endpoints for clarity
- Change log level from DEBUG to WARNING for reconciliation table errors
- Replace redundant sentry_sdk.capture_message with breadcrumb for billing alerts
- Update tests to reflect new breadcrumb-based alerting approach
- Apply code formatting via ruff

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@supabase
Copy link

supabase bot commented Feb 2, 2026

This pull request has been ignored for the connected project ynleroehyrmaafkgjgmr because there are no changes detected in supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

This PR modernizes type hints to Python 3.9+ style by replacing typing.Dict with dict across chat adapters, and introduces resilient credit handling for streaming requests with retry logic, exponential backoff, Sentry alerting, and database reconciliation.

Changes

Cohort / File(s) Summary
Type Annotation Modernization
src/adapters/chat/ai_sdk.py, src/adapters/chat/anthropic.py, src/adapters/chat/base.py, src/adapters/chat/openai.py
Updated method signatures from Dict[str, Any] to dict[str, Any] for Python 3.9+ compliance; removed Dict from typing imports.
Streaming Credit Handling
src/routes/chat.py
Added _handle_credits_and_usage_with_fallback() for non-blocking streaming credit deduction with fallback retry path; updated streaming flow to use new fallback handler and log reconciliation failures.
Credit Handler Service
src/services/credit_handler.py
Introduced retry logic with exponential backoff, Prometheus metrics recording, Sentry critical billing alerts, and reconciliation logging; added handle_credits_and_usage_with_fallback() wrapper for streaming contexts.
Prometheus Metrics
src/services/prometheus_metrics.py
Added six new metrics: credit_deduction_total, credit_deduction_amount_usd, credit_deduction_retry_count, credit_deduction_latency, streaming_background_task_failures, missed_credit_deductions_usd.
Database Schema
supabase/migrations/20260202000001_create_credit_deduction_failures_table.sql
Created credit_deduction_failures table with audit trail for failed deductions, RLS policies for service\_role access, and indexes for query optimization.
Test Coverage
tests/services/test_credit_handler.py
Added comprehensive test suite covering metrics recording, retry behavior, Sentry alerting, trial handling, fallback wrapper, and failure scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Stream as Stream Handler
    participant Route as Chat Route
    participant Credit as Credit Handler
    participant Retry as Retry Loop
    participant Sentry as Sentry
    participant DB as Supabase
    participant Prometheus as Prometheus

    Stream->>Route: stream completion event
    Route->>Credit: await handle_credits_and_usage_with_fallback()
    Credit->>Retry: attempt deduction (attempt 1)
    alt Deduction Success
        Retry-->>Credit: success
        Credit->>Prometheus: record success metric
        Credit-->>Route: (cost, True)
    else Transient Failure
        Retry->>Retry: exponential backoff
        Retry->>Credit: retry (attempt 2..N)
        alt After Max Retries
            Retry-->>Credit: failure
            Credit->>Sentry: send_critical_billing_alert()
            Credit->>DB: log_failed_deduction_for_reconciliation()
            Credit->>Prometheus: record missed deduction
            Credit-->>Route: (fallback_cost, False)
            Route->>Route: log reconciliation warning
        else Succeeds on Retry
            Retry-->>Credit: success
            Credit->>Prometheus: record success + retry metrics
            Credit-->>Route: (cost, True)
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Poem

🐰 With type hints fresh and streams now safe,
Retries dance in exponential grace,
Failed credits find their refuge trace,
In database logs with loving embrace—
A rabbit hops through billing's careful space! 🎯💳

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: introducing audit mechanisms (reconciliation table), retry logic, Prometheus metrics, and Sentry alerting for API credit deductions across streaming and non-streaming contexts.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gatewayz-code/audit-api-credit-deductions-ydd619

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Feb 2, 2026

Schema Sync Check Results ✅

All schema validation checks passed!

Schema Summary

  • Tables: 22
  • Functions: 0
  • Indexes: 0

Checks Performed

  • ✅ Migration file naming convention
  • ✅ Migration timestamp conflicts
  • ✅ SQL syntax validation
  • ✅ Schema applied successfully to local Supabase
  • ✅ Schema validation against expected tables
  • ✅ Seed data inserted successfully

This check runs automatically on PRs that modify database migrations.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae685cf885

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +321 to +324
# Record success metrics
latency = time.monotonic() - start_time
status = "success" if attempt == 1 else "retried"
_record_credit_metrics(status, cost, endpoint, is_streaming, latency, attempt)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count successful retries as success in metrics

When a deduction succeeds after one or more retries, the status is set to retried and passed into _record_credit_metrics, so the success counters never increment for those requests. That means success rates and amounts derived from credit_deduction_total/credit_deduction_amount_usd will undercount real successes whenever the first attempt failed, which can mislead alerting and dashboards that track billing reliability. Consider marking any successful deduction as success and rely on attempt_number/retry metrics to capture retry behavior.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/credit_handler.py (1)

244-252: ⚠️ Potential issue | 🟠 Major

Potential double-count of trial usage.

handle_credits_and_usage now records trial usage via track_trial_usage_for_key, but the chat routes still track trial usage after non-streaming responses. This can consume trial quotas twice for the same request. Consider centralizing trial tracking in one place (or adding a flag to skip when already recorded).

🤖 Fix all issues with AI agents
In `@src/routes/chat.py`:
- Around line 2618-2631: The block that manually increments
model_inference_requests, observes model_inference_duration, and increments
tokens_used inside the unified handler duplicates metrics already emitted by
_record_inference_metrics_and_health(); remove this manual recording block from
the unified handler (the calls to model_inference_requests.labels(...).inc(),
model_inference_duration.labels(...).observe(...), and
tokens_used.labels(...).inc(...)) so only _record_inference_metrics_and_health()
records these inference metrics, or alternatively move them to a distinct metric
namespace if you truly need separate "handler-only" metrics.

In `@src/services/credit_handler.py`:
- Around line 515-555: Extract the Supabase insertion out of
_log_failed_deduction_for_reconciliation into a new sync DB helper (e.g.,
src/db/reconciliation.py → log_failed_deduction_for_reconciliation or
src/db/credit_transactions.py → log_failed_deduction) that mirrors the pattern
in log_credit_transaction: build the record dict, obtain get_supabase_client()
inside the DB helper (so client creation/I/O happens in the sync context),
implement the insert as a do_insert(client) callable and call
execute_with_retry(do_insert, ...). Then update
_log_failed_deduction_for_reconciliation to only call that DB helper via await
asyncio.to_thread(db.log_failed_deduction_for_reconciliation, ...) and remove
any direct get_supabase_client() / client.table(...).execute() usage from the
service.
🧹 Nitpick comments (5)
tests/services/test_credit_handler.py (2)

12-12: Unused import: asyncio.

The asyncio module is imported but not directly used. The pytest.mark.asyncio decorator handles async test execution without requiring explicit asyncio imports.

🧹 Proposed fix
-import asyncio
 from unittest.mock import AsyncMock, MagicMock, patch, call

199-218: Docstring is inconsistent with test behavior.

The docstring says "Test that Sentry breadcrumb and alert are sent for all costs" but the method name is test_skips_message_for_small_cost, implying something should be skipped. The test actually verifies that both capture_payment_error and add_breadcrumb are called regardless of cost, which contradicts the method name.

Consider renaming the test method to match the actual behavior being tested.

📝 Proposed fix
-    def test_skips_message_for_small_cost(self, mock_capture, mock_add_breadcrumb):
-        """Test that Sentry breadcrumb and alert are sent for all costs."""
+    def test_sends_alert_for_small_cost(self, mock_capture, mock_add_breadcrumb):
+        """Test that Sentry alert and breadcrumb are sent even for small costs."""
supabase/migrations/20260202000001_create_credit_deduction_failures_table.sql (1)

136-150: Down migration is commented out rather than executable.

Per coding guidelines, migrations should "include both up and down migrations." The down migration is provided as comments for manual execution, which is a common pattern but doesn't allow automated rollback.

Consider whether automated rollback capability is needed for this table. If the table may contain critical audit data, manual rollback (as currently designed) may actually be preferable to prevent accidental data loss.

src/routes/chat.py (1)

913-947: Add parameter type hints to _handle_credits_and_usage_with_fallback.

The new helper omits parameter annotations, which breaks the repo-wide typing requirement and makes static checks less effective.

💡 Suggested fix
-async def _handle_credits_and_usage_with_fallback(
-    api_key,
-    user,
-    model,
-    trial,
-    total_tokens,
-    prompt_tokens,
-    completion_tokens,
-    elapsed_ms,
-) -> tuple[float, bool]:
+async def _handle_credits_and_usage_with_fallback(
+    api_key: str,
+    user: dict,
+    model: str,
+    trial: dict,
+    total_tokens: int,
+    prompt_tokens: int,
+    completion_tokens: int,
+    elapsed_ms: int,
+) -> tuple[float, bool]:
As per coding guidelines: Use type hints throughout all Python code (Function parameters, return types, class attributes) targeting Python 3.12+ syntax.
src/services/credit_handler.py (1)

165-199: Add OpenTelemetry spans around credit deduction retries.

This is a critical business-logic path; adding spans (e.g., around the retry loop and reconciliation logging) would align with tracing requirements and help correlate billing failures with request traces.
As per coding guidelines: Use OpenTelemetry tracing from src/config/opentelemetry_config.py in all provider client calls and critical business logic paths to enable distributed tracing via Tempo integration.

Comment on lines +2618 to +2631
model_inference_requests.labels(
provider=provider, model=model, status="success"
).inc()
model_inference_duration.labels(provider=provider, model=model).observe(
inference_duration
)
if input_tokens > 0:
tokens_used.labels(provider=provider, model=model, token_type="input").inc(input_tokens)
tokens_used.labels(provider=provider, model=model, token_type="input").inc(
input_tokens
)
if output_tokens > 0:
tokens_used.labels(provider=provider, model=model, token_type="output").inc(output_tokens)
tokens_used.labels(provider=provider, model=model, token_type="output").inc(
output_tokens
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the file structure
find . -type f -name "*.py" | grep -E "(routes|chat)" | head -20

Repository: Alpaca-Network/gatewayz-backend

Length of output: 669


🏁 Script executed:

# Check file size to determine how to read it
wc -l src/routes/chat.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 98


🏁 Script executed:

# Read the context around lines 2618-2631
sed -n '2600,2650p' src/routes/chat.py | cat -n

Repository: Alpaca-Network/gatewayz-backend

Length of output: 2967


🏁 Script executed:

# Search for _record_inference_metrics_and_health function
rg "_record_inference_metrics_and_health" -A 20 src/routes/chat.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3943


🏁 Script executed:

# Also search for where this function is called to understand flow
rg "_record_inference_metrics_and_health\(" src/routes/chat.py

Repository: Alpaca-Network/gatewayz-backend

Length of output: 342


🏁 Script executed:

# Get more complete view of _record_inference_metrics_and_health
rg "_record_inference_metrics_and_health" -B 5 -A 30 src/routes/chat.py | head -80

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3032


🏁 Script executed:

# Get larger context around lines 2618-2631 to see execution flow
sed -n '2550,2700p' src/routes/chat.py | cat -n

Repository: Alpaca-Network/gatewayz-backend

Length of output: 9086


🏁 Script executed:

# Search for all places where these specific metrics are recorded
rg "model_inference_requests|model_inference_duration|tokens_used" src/routes/chat.py | head -30

Repository: Alpaca-Network/gatewayz-backend

Length of output: 1260


🏁 Script executed:

# Get context AFTER the unified handler to see what happens next
sed -n '2700,2850p' src/routes/chat.py | cat -n

Repository: Alpaca-Network/gatewayz-backend

Length of output: 8861


🏁 Script executed:

# Search for the full endpoint handler to understand the complete flow
rg "async def chat_completions" -A 300 src/routes/chat.py | head -350

Repository: Alpaca-Network/gatewayz-backend

Length of output: 13752


🏁 Script executed:

# Look for where _record_inference_metrics_and_health is called after the unified handler
sed -n '2800,3000p' src/routes/chat.py | cat -n

Repository: Alpaca-Network/gatewayz-backend

Length of output: 10330


🏁 Script executed:

# Search specifically for calls to _record_inference_metrics_and_health in the main flow (not in streaming)
rg "await _record_inference_metrics_and_health" -B 10 -A 2 src/routes/chat.py | head -150

Repository: Alpaca-Network/gatewayz-backend

Length of output: 2281


Remove duplicate metric recording from unified handler.

The manual recording at lines 2618-2631 (model_inference_requests, model_inference_duration, tokens_used) duplicates metrics already recorded by _record_inference_metrics_and_health() called at line ~2800. For unified handler requests, this double-counts all inference metrics. Remove the manual recording block or move it to a distinct metric namespace if separate "handler-only" tracking is needed.

Suggested fix
-                inference_duration = time.time() - inference_start
-                model_inference_requests.labels(
-                    provider=provider, model=model, status="success"
-                ).inc()
-                model_inference_duration.labels(provider=provider, model=model).observe(
-                    inference_duration
-                )
-                if input_tokens > 0:
-                    tokens_used.labels(provider=provider, model=model, token_type="input").inc(
-                        input_tokens
-                    )
-                if output_tokens > 0:
-                    tokens_used.labels(provider=provider, model=model, token_type="output").inc(
-                        output_tokens
-                    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_inference_requests.labels(
provider=provider, model=model, status="success"
).inc()
model_inference_duration.labels(provider=provider, model=model).observe(
inference_duration
)
if input_tokens > 0:
tokens_used.labels(provider=provider, model=model, token_type="input").inc(input_tokens)
tokens_used.labels(provider=provider, model=model, token_type="input").inc(
input_tokens
)
if output_tokens > 0:
tokens_used.labels(provider=provider, model=model, token_type="output").inc(output_tokens)
tokens_used.labels(provider=provider, model=model, token_type="output").inc(
output_tokens
)
🤖 Prompt for AI Agents
In `@src/routes/chat.py` around lines 2618 - 2631, The block that manually
increments model_inference_requests, observes model_inference_duration, and
increments tokens_used inside the unified handler duplicates metrics already
emitted by _record_inference_metrics_and_health(); remove this manual recording
block from the unified handler (the calls to
model_inference_requests.labels(...).inc(),
model_inference_duration.labels(...).observe(...), and
tokens_used.labels(...).inc(...)) so only _record_inference_metrics_and_health()
records these inference metrics, or alternatively move them to a distinct metric
namespace if you truly need separate "handler-only" metrics.

Comment on lines +515 to +555
async def _log_failed_deduction_for_reconciliation(
user_id: int | None,
api_key: str,
model: str,
cost: float,
total_tokens: int,
prompt_tokens: int,
completion_tokens: int,
endpoint: str,
error: str,
) -> None:
"""
Log failed credit deductions to database for later reconciliation.

This creates an audit trail of missed deductions that can be processed
manually or by a background job.
"""
try:
from src.config.supabase_config import get_supabase_client

client = get_supabase_client()

# Try to insert into a reconciliation table
# If the table doesn't exist, this will fail silently
await asyncio.to_thread(
lambda: client.table("credit_deduction_failures")
.insert(
{
"user_id": user_id,
"api_key_prefix": api_key[:10] + "..." if api_key else None,
"model": model,
"tokens": total_tokens,
"cost_usd": cost,
"api_key": api_key[:10] + "..." if api_key else None,
"total_tokens": total_tokens,
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"endpoint": endpoint,
},
"error_message": error[:1000],
"status": "pending",
}
)
raise # Re-raise to ensure billing errors are not silently ignored
.execute()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and the specific lines in question
find . -type f -name "*.py" | grep -E "(credit_handler|supabase_config|db)" | head -20

Repository: Alpaca-Network/gatewayz-backend

Length of output: 659


🏁 Script executed:

# Read the specific section of credit_handler.py
if [ -f "src/services/credit_handler.py" ]; then
  wc -l src/services/credit_handler.py
  sed -n '515,555p' src/services/credit_handler.py
fi

Repository: Alpaca-Network/gatewayz-backend

Length of output: 1469


🏁 Script executed:

# Check the implementation of get_supabase_client
if [ -f "src/config/supabase_config.py" ]; then
  cat src/config/supabase_config.py
fi

Repository: Alpaca-Network/gatewayz-backend

Length of output: 18229


🏁 Script executed:

# Explore the database module structure
find src/db -type f -name "*.py" 2>/dev/null | head -20

Repository: Alpaca-Network/gatewayz-backend

Length of output: 541


🏁 Script executed:

# Look for async patterns with to_thread usage in the codebase
rg "asyncio\.to_thread" --type py -B 3 -A 3 | head -50

Repository: Alpaca-Network/gatewayz-backend

Length of output: 4004


🏁 Script executed:

# Check credit_transactions.py to see if reconciliation should be there
if [ -f "src/db/credit_transactions.py" ]; then
  head -100 src/db/credit_transactions.py
fi

Repository: Alpaca-Network/gatewayz-backend

Length of output: 3648


🏁 Script executed:

# Search for reconciliation or failure logging patterns in db modules
rg "reconcil|failure|failed" --type py src/db/ -i -A 3

Repository: Alpaca-Network/gatewayz-backend

Length of output: 50387


🏁 Script executed:

# Look at how other db modules handle inserts with proper async patterns
rg "async def.*insert" --type py src/db/ -A 10 | head -50

Repository: Alpaca-Network/gatewayz-backend

Length of output: 57


🏁 Script executed:

# Check if any db module already has logging/audit functionality
ls -la src/db/ | grep -E "(log|audit|reconcil)"

Repository: Alpaca-Network/gatewayz-backend

Length of output: 146


Move credit deduction failure logging to a database module.

The blocking I/O concern is valid: get_supabase_client() can perform I/O on cold start (connection testing executes a database query), and calling it directly before asyncio.to_thread() blocks the event loop. However, the core issue is architectural: this function should not directly access the Supabase client or insert records from a service layer.

Per the codebase guidelines, src/services/ modules must be stateless and delegate all database operations to src/db/ modules. Create a helper function in src/db/credit_transactions.py (or a new src/db/reconciliation.py) to log failed deductions, following the pattern used in credit_transactions.py which leverages execute_with_retry() for resilience. Call this db module function from the service.

Example pattern from credit_transactions.py
def log_credit_transaction(...) -> dict[str, Any] | None:
    """Log a credit transaction to the audit trail"""
    transaction_data = {...}
    
    def do_insert(client):
        """Execute the insert operation with the given client."""
        result = client.table("credit_transactions").insert(transaction_data).execute()
        return result

    try:
        result = execute_with_retry(
            do_insert,
            max_retries=2,
            retry_delay=0.5,
            operation_name=f"log_credit_transaction(...)",
        )
🤖 Prompt for AI Agents
In `@src/services/credit_handler.py` around lines 515 - 555, Extract the Supabase
insertion out of _log_failed_deduction_for_reconciliation into a new sync DB
helper (e.g., src/db/reconciliation.py → log_failed_deduction_for_reconciliation
or src/db/credit_transactions.py → log_failed_deduction) that mirrors the
pattern in log_credit_transaction: build the record dict, obtain
get_supabase_client() inside the DB helper (so client creation/I/O happens in
the sync context), implement the insert as a do_insert(client) callable and call
execute_with_retry(do_insert, ...). Then update
_log_failed_deduction_for_reconciliation to only call that DB helper via await
asyncio.to_thread(db.log_failed_deduction_for_reconciliation, ...) and remove
any direct get_supabase_client() / client.table(...).execute() usage from the
service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant