Skip to content

feat: add generic tier-aware rate limiting framework#82

Open
Societus wants to merge 4 commits intorepowise-dev:mainfrom
Societus:feat/generic-tier-framework
Open

feat: add generic tier-aware rate limiting framework#82
Societus wants to merge 4 commits intorepowise-dev:mainfrom
Societus:feat/generic-tier-framework

Conversation

@Societus
Copy link
Copy Markdown

@Societus Societus commented Apr 14, 2026

Summary

Add a provider-agnostic tier resolution framework to BaseProvider. Any LLM provider with subscription-based rate tiers (Z.AI, MiniMax, etc.) can adopt this instead of implementing tier logic from scratch.

This is the foundation for a PR stack addressing #68.

Changes

BaseProvider (base.py)

  • Add RATE_LIMIT_TIERS: dict[str, Any] = {} class attribute -- providers override with their tier configs
  • Add resolve_rate_limiter() static method with precedence: tier > explicit rate_limiter > None
  • Case-insensitive tier matching
  • ValueError on invalid tier with helpful message listing valid options

Tests (test_generic_tier_framework.py)

  • 8 tests covering: tier creation, precedence, case-insensitivity, invalid tiers, empty tiers, explicit limiter passthrough, and default empty tiers

Design Decisions

  1. Static method, not instance method. resolve_rate_limiter() has no side effects -- it takes inputs and returns a RateLimiter or None. This makes it trivially testable and reusable without instantiating a provider.

  2. Late import of RateLimiter. Avoids circular dependency between base.py and rate_limiter.py at module level.

  3. Empty default RATE_LIMIT_TIERS. Providers without tier support (Anthropic, OpenAI, etc.) inherit {} -- no changes needed for existing providers.

  4. String tier names, not numeric. Providers define their own tier names (lite/pro/max for Z.AI, starter/plus/max/ultra for MiniMax). A future iteration could add numeric indexing for cross-provider comparison.

Test Plan

uv run pytest tests/unit/test_providers/test_generic_tier_framework.py -v
# 8 passed

All existing provider tests continue to pass -- RATE_LIMIT_TIERS = {} is a no-op for providers that do not override it.

PR Stack

# PR Description Status
1 #82 -- Generic tier framework (this PR) BaseProvider + resolve_rate_limiter() Ready for review
2 #83 -- Z.AI adopts the framework RATE_LIMIT_TIERS + ZAI_TIER env var Depends on this PR
3 #84 -- MiniMax provider New provider using the framework Depends on this PR

Related

vinit13792 and others added 4 commits April 13, 2026 12:29
- Add litellm to interactive provider selection menu
- Support LITELLM_BASE_URL for local proxy deployments (no API key required)
- Auto-add openai/ prefix when using api_base for proper LiteLLM routing
- Add dummy API key for local proxies (OpenAI SDK requirement)
- Add validation and tests for litellm provider configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… false positives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add first-class support for Z.AI with OpenAI-compatible API.

- New ZAIProvider with thinking disabled by default for GLM-5 family
- Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go)
- Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING
- Rate limit defaults and auto-detection in CLI helpers

Closes repowise-dev#68
Add RATE_LIMIT_TIERS class attribute and resolve_rate_limiter() static
method to BaseProvider. Any provider with subscription tiers can define
RATE_LIMIT_TIERS and pass tier + tiers to resolve_rate_limiter() to get
automatic tier-aware rate limiter creation.

Precedence: tier > explicit rate_limiter > None.
Tier matching is case-insensitive. Invalid tiers raise ValueError.

This is a provider-agnostic foundation -- no provider-specific code.
Providers adopt it by defining RATE_LIMIT_TIERS and calling
resolve_rate_limiter() in their constructor.

Ref: repowise-dev#68
Societus added a commit to Societus/repowise that referenced this pull request Apr 14, 2026
Add MiniMax as a built-in provider using the generic tier framework (repowise-dev#82).

MiniMax is an OpenAI-compatible API provider with the M2.x model family
(M2.7, M2.5, M2.1, M2) and published token plan rate tiers.

Changes:
- New MiniMaxProvider with RATE_LIMIT_TIERS (starter/plus/max/ultra)
  derived from published 5-hour rolling window limits
- Uses resolve_rate_limiter() from BaseProvider for tier resolution
- reasoning_split=True by default to separate thinking from content
- Bumped retry budget: 5 retries / 30s max for load-shedding tolerance
- Registered in provider registry with openai package dependency hint
- Conservative PROVIDER_DEFAULTS (Starter-tier: 5 RPM / 25K TPM)
- CLI env vars: MINIMAX_API_KEY, MINIMAX_BASE_URL,
  MINIMAX_REASONING_SPLIT, MINIMAX_TIER
- 30 unit tests (constructor, tiers, generate, stream_chat, registry)

Rate limit tiers (from https://platform.minimax.io/docs/token-plan/intro):
  Starter:  1,500 req/5hrs  ->  5 RPM /  25K TPM
  Plus:     4,500 req/5hrs  -> 15 RPM /  75K TPM
  Max:     15,000 req/5hrs  -> 50 RPM / 250K TPM
  Ultra:   30,000 req/5hrs  -> 100 RPM / 500K TPM

Highspeed variants (e.g., MiniMax-M2.7-highspeed) share the same rate
limits as their base plan -- the difference is faster inference, not quota.

This provider is structurally identical to Z.AI (repowise-dev#83) and was trivial
to implement because both use the generic tier framework. The framework
eliminated all per-provider boilerplate for tier resolution.

Depends on: repowise-dev#82 (generic tier framework)
Ref: repowise-dev#68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants