Skip to content

feat(zai): adopt tier framework for plan-aware rate limiting#83

Open
Societus wants to merge 5 commits intorepowise-dev:mainfrom
Societus:feat/zai-adopts-tier-framework
Open

feat(zai): adopt tier framework for plan-aware rate limiting#83
Societus wants to merge 5 commits intorepowise-dev:mainfrom
Societus:feat/zai-adopts-tier-framework

Conversation

@Societus
Copy link
Copy Markdown

@Societus Societus commented Apr 14, 2026

Summary

Wire Z.AI provider into the generic tier framework from #82. Adds plan-aware rate limiting based on Z.AI subscription tier (Lite/Pro/Max) with environment variable configuration.

Depends on: #82 (generic tier framework -- merge that first)

Changes

Z.AI Provider (zai.py)

  • Define RATE_LIMIT_TIERS with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026)
  • Use resolve_rate_limiter() from BaseProvider in constructor
  • Add tier parameter to constructor and docstring
  • Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance

Rate Limiter (rate_limiter.py)

  • Update PROVIDER_DEFAULTS["zai"] to conservative Lite-tier default (10 RPM / 50K TPM)
  • Add ZAI_TIER_DEFAULTS dict for reference and documentation

CLI Helpers (helpers.py)

  • Add ZAI_TIER env var reading in both explicit and auto-detect provider resolution paths

Tests (test_zai_provider.py)

  • 13 new tests: tier creation, per-tier limits (lite/pro/max), case-insensitive matching, tier precedence over explicit limiter, invalid tier error, no-tier edge case, explicit limiter without tier, tier stored, and cross-provider empty tiers check

Rate Limit Context

Z.AI support provided the following guidance (April 2026):

  • Lite: 2-3 concurrent, lower tolerance -> 10 RPM / 50K TPM
  • Pro: 5-8 concurrent, moderate tolerance -> 30 RPM / 150K TPM
  • Max: 10-15 concurrent, highest tolerance -> 60 RPM / 300K TPM

Key facts:

  • Limits are aggregate across all models (not per-model)
  • Advanced models (GLM-5 family) consume 2-3x quota per prompt due to reasoning tokens
  • Limits are dynamic and load-dependent; these are conservative estimates
  • Ref: https://docs.z.ai/devpack/usage-policy

Configuration

# Required
export ZAI_API_KEY="***"

# Optional -- defaults to conservative (Lite-equivalent)
export ZAI_TIER="pro"           # lite | pro | max
export ZAI_PLAN="coding"        # coding | general
export ZAI_THINKING="disabled"  # enabled | disabled
export ZAI_BASE_URL="..."       # override plan-based URL

Test Plan

uv run pytest tests/unit/test_providers/test_zai_provider.py -v
# 34 passed (21 existing + 13 new tier tests)

PR Stack

# PR Description Status
1 #82 -- Generic tier framework BaseProvider + resolve_rate_limiter() Ready for review
2 #83 -- Z.AI adopts the framework (this PR) RATE_LIMIT_TIERS + ZAI_TIER env var Depends on #82
3 #84 -- MiniMax provider New provider using the framework Depends on #82

Related

vinit13792 and others added 5 commits April 13, 2026 12:29
- Add litellm to interactive provider selection menu
- Support LITELLM_BASE_URL for local proxy deployments (no API key required)
- Auto-add openai/ prefix when using api_base for proper LiteLLM routing
- Add dummy API key for local proxies (OpenAI SDK requirement)
- Add validation and tests for litellm provider configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… false positives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add first-class support for Z.AI with OpenAI-compatible API.

- New ZAIProvider with thinking disabled by default for GLM-5 family
- Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go)
- Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING
- Rate limit defaults and auto-detection in CLI helpers

Closes repowise-dev#68
Add RATE_LIMIT_TIERS class attribute and resolve_rate_limiter() static
method to BaseProvider. Any provider with subscription tiers can define
RATE_LIMIT_TIERS and pass tier + tiers to resolve_rate_limiter() to get
automatic tier-aware rate limiter creation.

Precedence: tier > explicit rate_limiter > None.
Tier matching is case-insensitive. Invalid tiers raise ValueError.

This is a provider-agnostic foundation -- no provider-specific code.
Providers adopt it by defining RATE_LIMIT_TIERS and calling
resolve_rate_limiter() in their constructor.

Ref: repowise-dev#68
Wire Z.AI provider into the BaseProvider tier framework (from PR #NN).

Changes:
- Define RATE_LIMIT_TIERS on ZAIProvider with Lite/Pro/Max configs
  derived from Z.AI support guidance (April 2026)
- Use resolve_rate_limiter() in constructor (tier > explicit > none)
- Add ZAI_TIER env var support in CLI helpers
- Add ZAI_TIER_DEFAULTS to rate_limiter.py for reference
- Update PROVIDER_DEFAULTS['zai'] to conservative Lite-tier default
- Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI
  load-shedding tolerance
- Add tier parameter to constructor and docstring

Rate limit context:
- Z.AI concurrency limits are aggregate, dynamic, and load-dependent
- Advanced models (GLM-5 family) consume 2-3x quota per prompt
- Conservative defaults: Lite 10 RPM, Pro 30 RPM, Max 60 RPM
- Ref: https://docs.z.ai/devpack/usage-policy

Depends on: feat/generic-tier-framework
Supersedes: repowise-dev#80 (deprecates monolithic PR in favor of layered approach)
Ref: repowise-dev#68
Societus added a commit to Societus/repowise that referenced this pull request Apr 14, 2026
Add MiniMax as a built-in provider using the generic tier framework (repowise-dev#82).

MiniMax is an OpenAI-compatible API provider with the M2.x model family
(M2.7, M2.5, M2.1, M2) and published token plan rate tiers.

Changes:
- New MiniMaxProvider with RATE_LIMIT_TIERS (starter/plus/max/ultra)
  derived from published 5-hour rolling window limits
- Uses resolve_rate_limiter() from BaseProvider for tier resolution
- reasoning_split=True by default to separate thinking from content
- Bumped retry budget: 5 retries / 30s max for load-shedding tolerance
- Registered in provider registry with openai package dependency hint
- Conservative PROVIDER_DEFAULTS (Starter-tier: 5 RPM / 25K TPM)
- CLI env vars: MINIMAX_API_KEY, MINIMAX_BASE_URL,
  MINIMAX_REASONING_SPLIT, MINIMAX_TIER
- 30 unit tests (constructor, tiers, generate, stream_chat, registry)

Rate limit tiers (from https://platform.minimax.io/docs/token-plan/intro):
  Starter:  1,500 req/5hrs  ->  5 RPM /  25K TPM
  Plus:     4,500 req/5hrs  -> 15 RPM /  75K TPM
  Max:     15,000 req/5hrs  -> 50 RPM / 250K TPM
  Ultra:   30,000 req/5hrs  -> 100 RPM / 500K TPM

Highspeed variants (e.g., MiniMax-M2.7-highspeed) share the same rate
limits as their base plan -- the difference is faster inference, not quota.

This provider is structurally identical to Z.AI (repowise-dev#83) and was trivial
to implement because both use the generic tier framework. The framework
eliminated all per-provider boilerplate for tier resolution.

Depends on: repowise-dev#82 (generic tier framework)
Ref: repowise-dev#68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants