feat(zai): add Z.AI provider with tier-aware rate limiting#80
Closed
Societus wants to merge 4 commits intorepowise-dev:mainfrom
Closed
feat(zai): add Z.AI provider with tier-aware rate limiting#80Societus wants to merge 4 commits intorepowise-dev:mainfrom
Societus wants to merge 4 commits intorepowise-dev:mainfrom
Conversation
- Add litellm to interactive provider selection menu - Support LITELLM_BASE_URL for local proxy deployments (no API key required) - Auto-add openai/ prefix when using api_base for proper LiteLLM routing - Add dummy API key for local proxies (OpenAI SDK requirement) - Add validation and tests for litellm provider configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add first-class support for Z.AI with OpenAI-compatible API. - New ZAIProvider with thinking disabled by default for GLM-5 family - Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go) - Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING - Rate limit defaults and auto-detection in CLI helpers Closes repowise-dev#68
Add ZAI_TIER environment variable support for plan-aware concurrency. Users can set ZAI_TIER=lite|pro|max to get appropriate rate limits derived from Z.AI support guidance. Changes: - rate_limiter.py: Add ZAI_TIER_DEFAULTS (lite/pro/max configs), update provider default to conservative 10 RPM/50k TPM - zai.py: Add tier parameter, tier takes precedence over registry default limiter. Bump retry budget to 5 retries / 30s max wait. - helpers.py: Read ZAI_TIER env var in both explicit and auto-detect provider resolution paths - tests: 10 new tests covering tier creation, precedence, case insensitivity, invalid tier handling, and edge cases Ref: https://docs.z.ai/devpack/usage-policy Related: repowise-dev#68
3 tasks
Societus
added a commit
to Societus/repowise
that referenced
this pull request
Apr 14, 2026
Wire Z.AI provider into the BaseProvider tier framework (from PR #NN). Changes: - Define RATE_LIMIT_TIERS on ZAIProvider with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026) - Use resolve_rate_limiter() in constructor (tier > explicit > none) - Add ZAI_TIER env var support in CLI helpers - Add ZAI_TIER_DEFAULTS to rate_limiter.py for reference - Update PROVIDER_DEFAULTS['zai'] to conservative Lite-tier default - Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance - Add tier parameter to constructor and docstring Rate limit context: - Z.AI concurrency limits are aggregate, dynamic, and load-dependent - Advanced models (GLM-5 family) consume 2-3x quota per prompt - Conservative defaults: Lite 10 RPM, Pro 30 RPM, Max 60 RPM - Ref: https://docs.z.ai/devpack/usage-policy Depends on: feat/generic-tier-framework Supersedes: repowise-dev#80 (deprecates monolithic PR in favor of layered approach) Ref: repowise-dev#68
Author
Superseded by layered PR stackThis monolithic PR has been split into two focused PRs that build on each other:
The layered approach is cleaner because:
Both PRs are ready for review with passing tests. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Z.AI (Zhipu AI) as a supported LLM provider with plan-aware rate limiting based on subscription tier (Lite/Pro/Max).
Built on top of the Z.AI provider implementation from #74 (thanks @vinit13792), rebased onto latest main with the addition of tier-aware rate limiting and improved retry behavior.
Changes
Z.AI Provider (
zai.py)ZAI_THINKING=enabled|disabled) for GLM-5 family reasoning tokensZAI_PLAN=coding|paygmaps to the correct base URLZAI_TIER=lite|pro|maxenvironment variableValueErrorwith valid options listedRate Limiter (
rate_limiter.py)ZAI_TIER_DEFAULTSwith per-tier limits derived from Z.AI support guidance:CLI Helpers (
helpers.py)ZAI_TIERenv var read in both explicit and auto-detect provider resolution pathsZAI_BASE_URLfor custom endpoint overrideZAI_PLANfor plan selectionTests (10 new)
Configuration
Rate Limit Context
Z.AI's concurrency limits are dynamic and not published as fixed numbers. The tier defaults above are conservative starting points based on guidance from Z.AI support (April 2026):
Ref: https://docs.z.ai/devpack/usage-policy
Related
Test Plan
uv run python -m pytest tests/unit/test_providers/test_zai_provider.py -v # 32 passed (22 existing + 10 new tier tests)