Audit and centralize credit deduction loop for models by vdimarco · Pull Request #1020 · Alpaca-Network/gatewayz-backend

vdimarco · 2026-02-02T01:42:36Z

Summary

Introduces a unified credit handling service to centralize credit deduction, trial handling, and usage logging across chat and message endpoints.
Adds robust pricing fallbacks and observability for default pricing scenarios, including high-value model alerts and Prometheus metrics.
Replaces ad-hoc billing logic in endpoints with a single, auditable path to improve consistency and auditing.
Adds integration tests to validate pricing lookups, cost calculation, and credit deduction behavior for multiple model families.

Changes

Core Functionality

New: src/services/credit_handler.py
- handle_credits_and_usage(api_key, user, model, trial, total_tokens, prompt_tokens, completion_tokens, elapsed_ms, endpoint) -> float
- Centralizes pricing calculation, trial override logic, trial usage tracking, credit deduction, usage recording, and rate-limit updates.
- Handles Defender-in-depth override of is_trial when the user has an active subscription.
- Logs trial transactions, or deducted credits with rich metadata (including endpoint).
- Re-raises billing errors to avoid silent failures.
Updated: src/routes/chat.py
- Replaced inline billing logic with a thin wrapper that delegates to handle_credits_and_usage, maintaining backward compatibility and consistent billing.
Updated: src/routes/messages.py
- Replaced inline billing logic with unified handle_credits_and_usage usage for consistent billing across endpoints.

Pricing and Cost Calculation

Updated: src/services/pricing.py
- Added async pricing support via calculate_cost_async(model_id, prompt_tokens, completion_tokens).
- Introduced _default_pricing_tracker and _track_default_pricing_usage(model_id, error=None) to monitor models falling back to default pricing.
- Default pricing alerts via Sentry for high-value models and Prometheus metric increments (gatewayz_default_pricing_usage_total).
- get_model_pricing and get_model_pricing_async now log default pricing usage when data is missing and call into _track_default_pricing_usage.
Updated: src/services/prometheus_metrics.py
- Added default_pricing_usage_counter (labels: model) to track when default pricing is used.

Observability and Alerts

When a high-value model uses default pricing, a Sentry alert is emitted to help identify missing pricing data (and potential under-billing).
Warning logs and Prometheus metrics capture token estimation scenarios when provider omits usage data, with a counter per provider/model.

Tests

Added: tests/integration/test_credit_deduction_models.py
- Integration tests for credit deduction across OpenAI and Anthropic models, including streaming vs non-streaming paths, trial handling, and paid-user edge cases.
- Verifies cost calculation, credit deduction calls, and transaction logging behavior via the centralized credit handler.
- Covers alias resolution for models and pricing lookups with mocked pricing data.
- Validates async cost calculation path and default-pricing fallback behavior.

Why this change

Reduces duplication and potential inconsistencies across endpoints by consolidating all credit and usage handling into a single service.
Improves reliability and observability of billing, including fallbacks for missing pricing data and alerts for high-value models.
Enables easier auditing and future enhancements to pricing and billing rules.

QA/Tests

Run unit and integration tests:
- pytest tests/integration/test_credit_deduction_models.py
Verify behavior for:
- Paid users with active subscriptions (override trial path to paid)
- Trial users (no credit deduction, trial logging only)
- Unknown models falling back to default pricing with alerts
- Async cost calculation path returns correct costs
Check observability hooks:
- gatewayz_default_pricing_usage_total metric increments for default pricing
- Sentry alerts fire for high-value models using default pricing

Migration notes

This PR introduces a new credit handling service and adjusts two endpoints to route through a centralized path. Existing logic is removed from chat and messages handlers in favor of the unified handler. No database migrations required.

Verification steps

Trigger a chat or message request with known pricing data and verify that costs are computed via calculate_cost_async and that credits are deducted via the unified path.
Simulate a model without pricing data and confirm default pricing is used, metrics are incremented, and a warning is logged.
Validate trial vs. paid user flows, including trial usage tracking and transaction logging.
Confirm that high-value models using default pricing trigger Sentry alerts as configured.

🌿 Generated by Terry

ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://terragon-www-production.up.railway.app/task/5262f87c-6761-4bd0-869e-751d81ccffa6

Greptile Overview

Greptile Summary

This PR successfully centralizes credit deduction logic into a unified service (credit_handler.py), eliminating ~150 lines of duplicated billing code across chat and messages endpoints. The refactor improves consistency, auditability, and observability of billing operations.

Key Changes

Centralized billing: New handle_credits_and_usage() service consolidates pricing calculation, trial handling, credit deduction, and usage logging
Enhanced observability: Added default pricing tracking with Sentry alerts for high-value models and Prometheus metrics (gatewayz_default_pricing_usage_total)
Trial override protection: Defense-in-depth logic prevents stale is_trial flags from causing free usage for paid subscribers
Async pricing support: New calculate_cost_async() and get_model_pricing_async() functions enable live pricing lookups from async contexts
Token estimation warnings: Added logging and metrics when providers don't return usage data

Review Notes

The refactor is well-executed with comprehensive test coverage. A few areas for consideration:

Trial usage tracking failures are logged but don't prevent execution - verify this is acceptable if trial quota enforcement is critical
Token estimation (1 token ≈ 4 chars) in streaming responses could lead to billing inaccuracies for non-English content
The unused _to_thread helper in credit_handler.py should be removed for clarity

The addition of Sentry alerts for high-value models using default pricing is excellent for preventing under-billing. The sequence diagram shows the complete flow from request to billing.

Confidence Score: 4/5

Safe to merge with minor observability improvements recommended
The refactor centralizes billing logic effectively with comprehensive tests and excellent observability additions. Score reflects well-tested changes with minor style issues (unused helper, silent metric failures) that don't affect core functionality.
src/services/credit_handler.py (remove unused helper), src/routes/chat.py (verify token estimation accuracy acceptable)

Important Files Changed

Filename	Overview
src/services/credit_handler.py	New centralized credit handling service with trial override logic and comprehensive error tracking
src/routes/chat.py	Refactored to delegate billing to centralized handler; added token estimation warnings and metrics
src/routes/messages.py	Simplified billing flow by replacing inline logic with unified credit handler call
src/services/pricing.py	Added async pricing support, default pricing tracking with Sentry alerts, and Prometheus metrics for observability

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as Chat/Messages Endpoint
    participant CreditHandler as credit_handler.py
    participant Pricing as pricing.py
    participant Database as Supabase
    participant Metrics as Prometheus/Sentry

    Client->>Endpoint: POST /v1/chat/completions
    Endpoint->>Endpoint: Authenticate & validate request
    Endpoint->>Provider: Forward request to AI provider
    Provider-->>Endpoint: Stream/return response with tokens
    
    Note over Endpoint: Extract usage data
    Endpoint->>CreditHandler: handle_credits_and_usage()
    
    CreditHandler->>Pricing: calculate_cost_async(model, tokens)
    Pricing->>Pricing: get_model_pricing_async(model_id)
    
    alt Pricing found in cache/DB
        Pricing-->>CreditHandler: Return pricing data
    else No pricing data
        Pricing->>Metrics: Track default pricing usage
        Pricing->>Metrics: Send Sentry alert (if high-value model)
        Pricing-->>CreditHandler: Return default pricing ($0.00002/token)
    end
    
    CreditHandler->>CreditHandler: Calculate cost = prompt*rate + completion*rate
    
    alt User has active subscription (defense-in-depth)
        CreditHandler->>CreditHandler: Override is_trial=FALSE if stale flag detected
    end
    
    alt is_trial=TRUE (legitimate trial user)
        CreditHandler->>Database: track_trial_usage()
        CreditHandler->>Database: log_api_usage_transaction(cost=0, is_trial=TRUE)
    else is_trial=FALSE (paid user)
        CreditHandler->>Database: deduct_credits(api_key, cost)
        CreditHandler->>Database: record_usage()
        CreditHandler->>Database: update_rate_limit_usage()
    end
    
    alt Billing error
        CreditHandler->>Metrics: capture_payment_error()
        CreditHandler-->>Endpoint: Raise exception
    end
    
    CreditHandler-->>Endpoint: Return cost
    Endpoint-->>Client: Return response with headers

- Introduced a unified credit_handler service consolidating credit deduction, trial tracking, and usage logging for all chat-related API routes (OpenAI, Anthropic, AI SDK). - Updated `chat.py` and `messages.py` routes to delegate billing logic to the centralized handler, eliminating duplicated code and ensuring consistent billing. - Added async cost calculation support in pricing module to enable live pricing fetch in async contexts. - Implemented monitoring and alerts for default pricing fallback usage to detect missing or stale pricing data in the system. - Enhanced token usage logging with warnings for estimated token counts when providers do not return usage. - Added extensive integration tests covering credit deduction scenarios across multiple models and providers. This refactor improves billing consistency, observability, and test coverage across the platform. Co-authored-by: gatewayz-ai-inbox[bot] <gatewayz-ai-inbox[bot]@users.noreply.github.com>

- Remove redundant `import time` in pricing.py (already imported at top) - Fix loop variable redefinition in pricing.py (use `_` and `new_loop`) - Remove unused `Any` import from credit_handler.py - Remove unused imports from test file (asyncio, AsyncMock, MagicMock) - Add cost assertions to test cases to use the cost variable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-02-02T01:42:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-02-02T01:42:43Z

Warning

Rate limit exceeded

@vdimarco has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gatewayz-code/audit-credit-deduction-loop-dhv58q

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

supabase · 2026-02-02T01:42:46Z

This pull request has been ignored for the connected project ynleroehyrmaafkgjgmr because there are no changes detected in supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.

Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

greptile-apps

_{4 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-02T01:47:23Z

src/services/credit_handler.py

+    # Helper to run sync functions in thread pool
+    async def _to_thread(func, *args, **kwargs):
+        return await asyncio.to_thread(func, *args, **kwargs)


Helper function _to_thread is defined but asyncio.to_thread is already imported and used directly in the function body. This helper adds no value and creates confusion.

Suggested change

# Helper to run sync functions in thread pool

async def _to_thread(func, *args, **kwargs):

return await asyncio.to_thread(func, *args, **kwargs)

# asyncio.to_thread is used directly below - no helper needed

Prompt To Fix With AI

This is a comment left during a code review. Path: src/services/credit_handler.py Line: 64:66 Comment: Helper function `_to_thread` is defined but `asyncio.to_thread` is already imported and used directly in the function body. This helper adds no value and creates confusion. ```suggestion # asyncio.to_thread is used directly below - no helper needed ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T01:47:24Z

src/services/credit_handler.py

+    if is_trial and not trial.get("is_expired"):
+        try:
+            await _to_thread(
+                track_trial_usage,
+                api_key,
+                total_tokens,
+                1,
+                model_id=model,
+                prompt_tokens=prompt_tokens,
+                completion_tokens=completion_tokens,
+            )
+        except Exception as e:
+            logger.warning("Failed to track trial usage: %s", e)
+


Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/services/credit_handler.py Line: 93:106 Comment: Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T01:47:28Z

src/routes/chat.py

@@ -1548,6 +1460,28 @@ async def iterate_stream():
            prompt_tokens = max(1, prompt_chars // 4)
            total_tokens = prompt_tokens + completion_tokens


Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/routes/chat.py Line: 1441:1461 Comment: Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T01:47:29Z

src/routes/chat.py

+            # Log warning about token estimation (potential billing inaccuracy)
+            logger.warning(
+                f"[TOKEN_ESTIMATION] Provider {provider} did not return usage data for model {model}. "
+                f"Using character-based estimation: prompt_tokens={prompt_tokens}, "
+                f"completion_tokens={completion_tokens}, total_tokens={total_tokens}. "
+                f"Content length: {len(accumulated_content)} chars. "
+                f"This may result in inaccurate billing."
+            )
+
+            # Track metric for monitoring
+            try:
+                from src.services.prometheus_metrics import get_or_create_metric, Counter
+                token_estimation_counter = get_or_create_metric(
+                    Counter,
+                    "gatewayz_token_estimation_total",
+                    "Count of requests where token usage was estimated (not provided by provider)",
+                    ["provider", "model"],
+                )
+                token_estimation_counter.labels(provider=provider, model=model).inc()
+            except Exception:
+                pass  # Metrics not available


Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/routes/chat.py Line: 1463:1483 Comment: Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-02T01:47:32Z

Additional Comments (1)

src/services/pricing.py
Complex async event loop handling in sync function creates fragility. When called from async context, the live pricing fetch is silently skipped with only a debug log. This could lead to stale pricing being used without clear visibility.

Check that callers are using get_model_pricing_async() from async contexts.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/services/pricing.py
Line: 408:446

Comment:
Complex async event loop handling in sync function creates fragility. When called from async context, the live pricing fetch is silently skipped with only a debug log. This could lead to stale pricing being used without clear visibility.

Check that callers are using `get_model_pricing_async()` from async contexts.

How can I resolve this? If you propose a fix, please make it concise.

vdimarco and others added 2 commits February 2, 2026 01:25

greptile-apps bot reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit and centralize credit deduction loop for models#1020

Audit and centralize credit deduction loop for models#1020
vdimarco wants to merge 2 commits intomainfrom
gatewayz-code/audit-credit-deduction-loop-dhv58q

vdimarco commented Feb 2, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

supabase bot commented Feb 2, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot Feb 2, 2026

Uh oh!

greptile-apps bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -1548,6 +1460,28 @@ async def iterate_stream():
		prompt_tokens = max(1, prompt_chars // 4)
		total_tokens = prompt_tokens + completion_tokens

Conversation

vdimarco commented Feb 2, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Functionality

Pricing and Cost Calculation

Observability and Alerts

Tests

Why this change

QA/Tests

Migration notes

Verification steps

Greptile Overview

Greptile Summary

Key Changes

Review Notes

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

chatgpt-codex-connector bot commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Rate limit exceeded

Uh oh!

supabase bot commented Feb 2, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vdimarco commented Feb 2, 2026 •

edited by greptile-apps bot

Loading