Skip to content

Audit and centralize credit deduction loop for models#1020

Open
vdimarco wants to merge 2 commits intomainfrom
gatewayz-code/audit-credit-deduction-loop-dhv58q
Open

Audit and centralize credit deduction loop for models#1020
vdimarco wants to merge 2 commits intomainfrom
gatewayz-code/audit-credit-deduction-loop-dhv58q

Conversation

@vdimarco
Copy link
Contributor

@vdimarco vdimarco commented Feb 2, 2026

Summary

  • Introduces a unified credit handling service to centralize credit deduction, trial handling, and usage logging across chat and message endpoints.
  • Adds robust pricing fallbacks and observability for default pricing scenarios, including high-value model alerts and Prometheus metrics.
  • Replaces ad-hoc billing logic in endpoints with a single, auditable path to improve consistency and auditing.
  • Adds integration tests to validate pricing lookups, cost calculation, and credit deduction behavior for multiple model families.

Changes

Core Functionality

  • New: src/services/credit_handler.py
    • handle_credits_and_usage(api_key, user, model, trial, total_tokens, prompt_tokens, completion_tokens, elapsed_ms, endpoint) -> float
    • Centralizes pricing calculation, trial override logic, trial usage tracking, credit deduction, usage recording, and rate-limit updates.
    • Handles Defender-in-depth override of is_trial when the user has an active subscription.
    • Logs trial transactions, or deducted credits with rich metadata (including endpoint).
    • Re-raises billing errors to avoid silent failures.
  • Updated: src/routes/chat.py
    • Replaced inline billing logic with a thin wrapper that delegates to handle_credits_and_usage, maintaining backward compatibility and consistent billing.
  • Updated: src/routes/messages.py
    • Replaced inline billing logic with unified handle_credits_and_usage usage for consistent billing across endpoints.

Pricing and Cost Calculation

  • Updated: src/services/pricing.py
    • Added async pricing support via calculate_cost_async(model_id, prompt_tokens, completion_tokens).
    • Introduced _default_pricing_tracker and _track_default_pricing_usage(model_id, error=None) to monitor models falling back to default pricing.
    • Default pricing alerts via Sentry for high-value models and Prometheus metric increments (gatewayz_default_pricing_usage_total).
    • get_model_pricing and get_model_pricing_async now log default pricing usage when data is missing and call into _track_default_pricing_usage.
  • Updated: src/services/prometheus_metrics.py
    • Added default_pricing_usage_counter (labels: model) to track when default pricing is used.

Observability and Alerts

  • When a high-value model uses default pricing, a Sentry alert is emitted to help identify missing pricing data (and potential under-billing).
  • Warning logs and Prometheus metrics capture token estimation scenarios when provider omits usage data, with a counter per provider/model.

Tests

  • Added: tests/integration/test_credit_deduction_models.py
    • Integration tests for credit deduction across OpenAI and Anthropic models, including streaming vs non-streaming paths, trial handling, and paid-user edge cases.
    • Verifies cost calculation, credit deduction calls, and transaction logging behavior via the centralized credit handler.
    • Covers alias resolution for models and pricing lookups with mocked pricing data.
    • Validates async cost calculation path and default-pricing fallback behavior.

Why this change

  • Reduces duplication and potential inconsistencies across endpoints by consolidating all credit and usage handling into a single service.
  • Improves reliability and observability of billing, including fallbacks for missing pricing data and alerts for high-value models.
  • Enables easier auditing and future enhancements to pricing and billing rules.

QA/Tests

  • Run unit and integration tests:
    • pytest tests/integration/test_credit_deduction_models.py
  • Verify behavior for:
    • Paid users with active subscriptions (override trial path to paid)
    • Trial users (no credit deduction, trial logging only)
    • Unknown models falling back to default pricing with alerts
    • Async cost calculation path returns correct costs
  • Check observability hooks:
    • gatewayz_default_pricing_usage_total metric increments for default pricing
    • Sentry alerts fire for high-value models using default pricing

Migration notes

  • This PR introduces a new credit handling service and adjusts two endpoints to route through a centralized path. Existing logic is removed from chat and messages handlers in favor of the unified handler. No database migrations required.

Verification steps

  • Trigger a chat or message request with known pricing data and verify that costs are computed via calculate_cost_async and that credits are deducted via the unified path.
  • Simulate a model without pricing data and confirm default pricing is used, metrics are incremented, and a warning is logged.
  • Validate trial vs. paid user flows, including trial usage tracking and transaction logging.
  • Confirm that high-value models using default pricing trigger Sentry alerts as configured.

🌿 Generated by Terry


ℹ️ Tag @terragon-labs to ask questions and address PR feedback

📎 Task: https://terragon-www-production.up.railway.app/task/5262f87c-6761-4bd0-869e-751d81ccffa6

Greptile Overview

Greptile Summary

This PR successfully centralizes credit deduction logic into a unified service (credit_handler.py), eliminating ~150 lines of duplicated billing code across chat and messages endpoints. The refactor improves consistency, auditability, and observability of billing operations.

Key Changes

  • Centralized billing: New handle_credits_and_usage() service consolidates pricing calculation, trial handling, credit deduction, and usage logging
  • Enhanced observability: Added default pricing tracking with Sentry alerts for high-value models and Prometheus metrics (gatewayz_default_pricing_usage_total)
  • Trial override protection: Defense-in-depth logic prevents stale is_trial flags from causing free usage for paid subscribers
  • Async pricing support: New calculate_cost_async() and get_model_pricing_async() functions enable live pricing lookups from async contexts
  • Token estimation warnings: Added logging and metrics when providers don't return usage data

Review Notes

The refactor is well-executed with comprehensive test coverage. A few areas for consideration:

  1. Trial usage tracking failures are logged but don't prevent execution - verify this is acceptable if trial quota enforcement is critical
  2. Token estimation (1 token ≈ 4 chars) in streaming responses could lead to billing inaccuracies for non-English content
  3. The unused _to_thread helper in credit_handler.py should be removed for clarity

The addition of Sentry alerts for high-value models using default pricing is excellent for preventing under-billing. The sequence diagram shows the complete flow from request to billing.

Confidence Score: 4/5

  • Safe to merge with minor observability improvements recommended
  • The refactor centralizes billing logic effectively with comprehensive tests and excellent observability additions. Score reflects well-tested changes with minor style issues (unused helper, silent metric failures) that don't affect core functionality.
  • src/services/credit_handler.py (remove unused helper), src/routes/chat.py (verify token estimation accuracy acceptable)

Important Files Changed

Filename Overview
src/services/credit_handler.py New centralized credit handling service with trial override logic and comprehensive error tracking
src/routes/chat.py Refactored to delegate billing to centralized handler; added token estimation warnings and metrics
src/routes/messages.py Simplified billing flow by replacing inline logic with unified credit handler call
src/services/pricing.py Added async pricing support, default pricing tracking with Sentry alerts, and Prometheus metrics for observability

Sequence Diagram

sequenceDiagram
    participant Client
    participant Endpoint as Chat/Messages Endpoint
    participant CreditHandler as credit_handler.py
    participant Pricing as pricing.py
    participant Database as Supabase
    participant Metrics as Prometheus/Sentry

    Client->>Endpoint: POST /v1/chat/completions
    Endpoint->>Endpoint: Authenticate & validate request
    Endpoint->>Provider: Forward request to AI provider
    Provider-->>Endpoint: Stream/return response with tokens
    
    Note over Endpoint: Extract usage data
    Endpoint->>CreditHandler: handle_credits_and_usage()
    
    CreditHandler->>Pricing: calculate_cost_async(model, tokens)
    Pricing->>Pricing: get_model_pricing_async(model_id)
    
    alt Pricing found in cache/DB
        Pricing-->>CreditHandler: Return pricing data
    else No pricing data
        Pricing->>Metrics: Track default pricing usage
        Pricing->>Metrics: Send Sentry alert (if high-value model)
        Pricing-->>CreditHandler: Return default pricing ($0.00002/token)
    end
    
    CreditHandler->>CreditHandler: Calculate cost = prompt*rate + completion*rate
    
    alt User has active subscription (defense-in-depth)
        CreditHandler->>CreditHandler: Override is_trial=FALSE if stale flag detected
    end
    
    alt is_trial=TRUE (legitimate trial user)
        CreditHandler->>Database: track_trial_usage()
        CreditHandler->>Database: log_api_usage_transaction(cost=0, is_trial=TRUE)
    else is_trial=FALSE (paid user)
        CreditHandler->>Database: deduct_credits(api_key, cost)
        CreditHandler->>Database: record_usage()
        CreditHandler->>Database: update_rate_limit_usage()
    end
    
    alt Billing error
        CreditHandler->>Metrics: capture_payment_error()
        CreditHandler-->>Endpoint: Raise exception
    end
    
    CreditHandler-->>Endpoint: Return cost
    Endpoint-->>Client: Return response with headers
Loading

vdimarco and others added 2 commits February 2, 2026 01:25
- Introduced a unified credit_handler service consolidating credit deduction, trial tracking, and usage logging for all chat-related API routes (OpenAI, Anthropic, AI SDK).
- Updated `chat.py` and `messages.py` routes to delegate billing logic to the centralized handler, eliminating duplicated code and ensuring consistent billing.
- Added async cost calculation support in pricing module to enable live pricing fetch in async contexts.
- Implemented monitoring and alerts for default pricing fallback usage to detect missing or stale pricing data in the system.
- Enhanced token usage logging with warnings for estimated token counts when providers do not return usage.
- Added extensive integration tests covering credit deduction scenarios across multiple models and providers.

This refactor improves billing consistency, observability, and test coverage across the platform.

Co-authored-by: gatewayz-ai-inbox[bot] <gatewayz-ai-inbox[bot]@users.noreply.github.com>
- Remove redundant `import time` in pricing.py (already imported at top)
- Fix loop variable redefinition in pricing.py (use `_` and `new_loop`)
- Remove unused `Any` import from credit_handler.py
- Remove unused imports from test file (asyncio, AsyncMock, MagicMock)
- Add cost assertions to test cases to use the cost variable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

Warning

Rate limit exceeded

@vdimarco has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 2 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gatewayz-code/audit-credit-deduction-loop-dhv58q

Comment @coderabbitai help to get the list of available commands and usage tips.

@supabase
Copy link

supabase bot commented Feb 2, 2026

This pull request has been ignored for the connected project ynleroehyrmaafkgjgmr because there are no changes detected in supabase/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +64 to +66
# Helper to run sync functions in thread pool
async def _to_thread(func, *args, **kwargs):
return await asyncio.to_thread(func, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper function _to_thread is defined but asyncio.to_thread is already imported and used directly in the function body. This helper adds no value and creates confusion.

Suggested change
# Helper to run sync functions in thread pool
async def _to_thread(func, *args, **kwargs):
return await asyncio.to_thread(func, *args, **kwargs)
# asyncio.to_thread is used directly below - no helper needed
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/credit_handler.py
Line: 64:66

Comment:
Helper function `_to_thread` is defined but `asyncio.to_thread` is already imported and used directly in the function body. This helper adds no value and creates confusion.

```suggestion
    # asyncio.to_thread is used directly below - no helper needed
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +93 to +106
if is_trial and not trial.get("is_expired"):
try:
await _to_thread(
track_trial_usage,
api_key,
total_tokens,
1,
model_id=model,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
)
except Exception as e:
logger.warning("Failed to track trial usage: %s", e)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/credit_handler.py
Line: 93:106

Comment:
Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 1441 to 1461
@@ -1548,6 +1460,28 @@ async def iterate_stream():
prompt_tokens = max(1, prompt_chars // 4)
total_tokens = prompt_tokens + completion_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 1441:1461

Comment:
Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1463 to +1483
# Log warning about token estimation (potential billing inaccuracy)
logger.warning(
f"[TOKEN_ESTIMATION] Provider {provider} did not return usage data for model {model}. "
f"Using character-based estimation: prompt_tokens={prompt_tokens}, "
f"completion_tokens={completion_tokens}, total_tokens={total_tokens}. "
f"Content length: {len(accumulated_content)} chars. "
f"This may result in inaccurate billing."
)

# Track metric for monitoring
try:
from src.services.prometheus_metrics import get_or_create_metric, Counter
token_estimation_counter = get_or_create_metric(
Counter,
"gatewayz_token_estimation_total",
"Count of requests where token usage was estimated (not provided by provider)",
["provider", "model"],
)
token_estimation_counter.labels(provider=provider, model=model).inc()
except Exception:
pass # Metrics not available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 1463:1483

Comment:
Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 2, 2026

Additional Comments (1)

src/services/pricing.py
Complex async event loop handling in sync function creates fragility. When called from async context, the live pricing fetch is silently skipped with only a debug log. This could lead to stale pricing being used without clear visibility.

Check that callers are using get_model_pricing_async() from async contexts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/pricing.py
Line: 408:446

Comment:
Complex async event loop handling in sync function creates fragility. When called from async context, the live pricing fetch is silently skipped with only a debug log. This could lead to stale pricing being used without clear visibility.

Check that callers are using `get_model_pricing_async()` from async contexts.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant