Audit and centralize credit deduction loop for models#1020
Audit and centralize credit deduction loop for models#1020
Conversation
- Introduced a unified credit_handler service consolidating credit deduction, trial tracking, and usage logging for all chat-related API routes (OpenAI, Anthropic, AI SDK). - Updated `chat.py` and `messages.py` routes to delegate billing logic to the centralized handler, eliminating duplicated code and ensuring consistent billing. - Added async cost calculation support in pricing module to enable live pricing fetch in async contexts. - Implemented monitoring and alerts for default pricing fallback usage to detect missing or stale pricing data in the system. - Enhanced token usage logging with warnings for estimated token counts when providers do not return usage. - Added extensive integration tests covering credit deduction scenarios across multiple models and providers. This refactor improves billing consistency, observability, and test coverage across the platform. Co-authored-by: gatewayz-ai-inbox[bot] <gatewayz-ai-inbox[bot]@users.noreply.github.com>
- Remove redundant `import time` in pricing.py (already imported at top) - Fix loop variable redefinition in pricing.py (use `_` and `new_loop`) - Remove unused `Any` import from credit_handler.py - Remove unused imports from test file (asyncio, AsyncMock, MagicMock) - Add cost assertions to test cases to use the cost variable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
| # Helper to run sync functions in thread pool | ||
| async def _to_thread(func, *args, **kwargs): | ||
| return await asyncio.to_thread(func, *args, **kwargs) |
There was a problem hiding this comment.
Helper function _to_thread is defined but asyncio.to_thread is already imported and used directly in the function body. This helper adds no value and creates confusion.
| # Helper to run sync functions in thread pool | |
| async def _to_thread(func, *args, **kwargs): | |
| return await asyncio.to_thread(func, *args, **kwargs) | |
| # asyncio.to_thread is used directly below - no helper needed |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/credit_handler.py
Line: 64:66
Comment:
Helper function `_to_thread` is defined but `asyncio.to_thread` is already imported and used directly in the function body. This helper adds no value and creates confusion.
```suggestion
# asyncio.to_thread is used directly below - no helper needed
```
How can I resolve this? If you propose a fix, please make it concise.| if is_trial and not trial.get("is_expired"): | ||
| try: | ||
| await _to_thread( | ||
| track_trial_usage, | ||
| api_key, | ||
| total_tokens, | ||
| 1, | ||
| model_id=model, | ||
| prompt_tokens=prompt_tokens, | ||
| completion_tokens=completion_tokens, | ||
| ) | ||
| except Exception as e: | ||
| logger.warning("Failed to track trial usage: %s", e) | ||
|
|
There was a problem hiding this comment.
Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/services/credit_handler.py
Line: 93:106
Comment:
Trial usage tracking failure is logged as warning but doesn't prevent execution. If trial usage tracking is critical for quota enforcement, silent failures here could allow trial users to exceed limits without detection.
How can I resolve this? If you propose a fix, please make it concise.| @@ -1548,6 +1460,28 @@ async def iterate_stream(): | |||
| prompt_tokens = max(1, prompt_chars // 4) | |||
| total_tokens = prompt_tokens + completion_tokens | |||
There was a problem hiding this comment.
Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 1441:1461
Comment:
Token estimation using character count (1 token ≈ 4 characters) is rough and could lead to significant billing inaccuracies, especially for non-English languages or code. The warning is good, but verify that this estimation error is acceptable for business requirements.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Log warning about token estimation (potential billing inaccuracy) | ||
| logger.warning( | ||
| f"[TOKEN_ESTIMATION] Provider {provider} did not return usage data for model {model}. " | ||
| f"Using character-based estimation: prompt_tokens={prompt_tokens}, " | ||
| f"completion_tokens={completion_tokens}, total_tokens={total_tokens}. " | ||
| f"Content length: {len(accumulated_content)} chars. " | ||
| f"This may result in inaccurate billing." | ||
| ) | ||
|
|
||
| # Track metric for monitoring | ||
| try: | ||
| from src.services.prometheus_metrics import get_or_create_metric, Counter | ||
| token_estimation_counter = get_or_create_metric( | ||
| Counter, | ||
| "gatewayz_token_estimation_total", | ||
| "Count of requests where token usage was estimated (not provided by provider)", | ||
| ["provider", "model"], | ||
| ) | ||
| token_estimation_counter.labels(provider=provider, model=model).inc() | ||
| except Exception: | ||
| pass # Metrics not available |
There was a problem hiding this comment.
Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/routes/chat.py
Line: 1463:1483
Comment:
Metrics increment wrapped in try/except that silently passes on failure. While graceful degradation is good for non-critical metrics, completely silent failures could hide issues with the metrics infrastructure.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Additional Comments (1)
Check that callers are using Prompt To Fix With AIThis is a comment left during a code review.
Path: src/services/pricing.py
Line: 408:446
Comment:
Complex async event loop handling in sync function creates fragility. When called from async context, the live pricing fetch is silently skipped with only a debug log. This could lead to stale pricing being used without clear visibility.
Check that callers are using `get_model_pricing_async()` from async contexts.
How can I resolve this? If you propose a fix, please make it concise. |
Summary
Changes
Core Functionality
Pricing and Cost Calculation
Observability and Alerts
Tests
Why this change
QA/Tests
Migration notes
Verification steps
🌿 Generated by Terry
ℹ️ Tag @terragon-labs to ask questions and address PR feedback
📎 Task: https://terragon-www-production.up.railway.app/task/5262f87c-6761-4bd0-869e-751d81ccffa6
Greptile Overview
Greptile Summary
This PR successfully centralizes credit deduction logic into a unified service (
credit_handler.py), eliminating ~150 lines of duplicated billing code across chat and messages endpoints. The refactor improves consistency, auditability, and observability of billing operations.Key Changes
handle_credits_and_usage()service consolidates pricing calculation, trial handling, credit deduction, and usage logginggatewayz_default_pricing_usage_total)is_trialflags from causing free usage for paid subscriberscalculate_cost_async()andget_model_pricing_async()functions enable live pricing lookups from async contextsReview Notes
The refactor is well-executed with comprehensive test coverage. A few areas for consideration:
_to_threadhelper in credit_handler.py should be removed for clarityThe addition of Sentry alerts for high-value models using default pricing is excellent for preventing under-billing. The sequence diagram shows the complete flow from request to billing.
Confidence Score: 4/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant Endpoint as Chat/Messages Endpoint participant CreditHandler as credit_handler.py participant Pricing as pricing.py participant Database as Supabase participant Metrics as Prometheus/Sentry Client->>Endpoint: POST /v1/chat/completions Endpoint->>Endpoint: Authenticate & validate request Endpoint->>Provider: Forward request to AI provider Provider-->>Endpoint: Stream/return response with tokens Note over Endpoint: Extract usage data Endpoint->>CreditHandler: handle_credits_and_usage() CreditHandler->>Pricing: calculate_cost_async(model, tokens) Pricing->>Pricing: get_model_pricing_async(model_id) alt Pricing found in cache/DB Pricing-->>CreditHandler: Return pricing data else No pricing data Pricing->>Metrics: Track default pricing usage Pricing->>Metrics: Send Sentry alert (if high-value model) Pricing-->>CreditHandler: Return default pricing ($0.00002/token) end CreditHandler->>CreditHandler: Calculate cost = prompt*rate + completion*rate alt User has active subscription (defense-in-depth) CreditHandler->>CreditHandler: Override is_trial=FALSE if stale flag detected end alt is_trial=TRUE (legitimate trial user) CreditHandler->>Database: track_trial_usage() CreditHandler->>Database: log_api_usage_transaction(cost=0, is_trial=TRUE) else is_trial=FALSE (paid user) CreditHandler->>Database: deduct_credits(api_key, cost) CreditHandler->>Database: record_usage() CreditHandler->>Database: update_rate_limit_usage() end alt Billing error CreditHandler->>Metrics: capture_payment_error() CreditHandler-->>Endpoint: Raise exception end CreditHandler-->>Endpoint: Return cost Endpoint-->>Client: Return response with headers