feat(zai): add Z.AI provider with tier-aware rate limiting by Societus · Pull Request #80 · repowise-dev/repowise

Societus · 2026-04-13T19:43:32Z

Summary

Adds Z.AI (Zhipu AI) as a supported LLM provider with plan-aware rate limiting based on subscription tier (Lite/Pro/Max).

Built on top of the Z.AI provider implementation from #74 (thanks @vinit13792), rebased onto latest main with the addition of tier-aware rate limiting and improved retry behavior.

Changes

Z.AI Provider (`zai.py`)

Full Z.AI API integration (OpenAI-compatible) with support for GLM-5.1, GLM-5, GLM-5-Turbo, and GLM-4-Flash
Thinking mode toggle (ZAI_THINKING=enabled|disabled) for GLM-5 family reasoning tokens
Plan-aware endpoints: ZAI_PLAN=coding|payg maps to the correct base URL
Tier-aware rate limiting via ZAI_TIER=lite|pro|max environment variable
- Tier takes precedence over registry-attached default rate limiter
- Invalid tier raises ValueError with valid options listed
Improved retry budget: 5 retries with up to 30s backoff (up from 3 retries / 4s) to handle Z.AI load-shedding under concurrent use

Rate Limiter (`rate_limiter.py`)

Updated Z.AI default from placeholder (60 RPM / 150K TPM) to conservative 10 RPM / 50K TPM
Added ZAI_TIER_DEFAULTS with per-tier limits derived from Z.AI support guidance:
- Lite: 10 RPM / 50K TPM (2-3 concurrent)
- Pro: 30 RPM / 150K TPM (5-8 concurrent)
- Max: 60 RPM / 300K TPM (10-15 concurrent)

CLI Helpers (`helpers.py`)

ZAI_TIER env var read in both explicit and auto-detect provider resolution paths
ZAI_BASE_URL for custom endpoint override
ZAI_PLAN for plan selection

Tests (10 new)

Tier creation, per-tier limit verification (lite/pro/max)
Case-insensitive tier matching
Tier precedence over explicit rate limiter
Invalid tier error handling
No-tier / no-limiter edge cases

Configuration

# Required
export ZAI_API_KEY="your-key"

# Optional — defaults to conservative (Lite-equivalent)
export ZAI_TIER="pro"           # lite | pro | max
export ZAI_PLAN="coding"        # coding | payg
export ZAI_THINKING="disabled"  # enabled | disabled
export ZAI_BASE_URL="..."       # override plan-based URL

Rate Limit Context

Z.AI's concurrency limits are dynamic and not published as fixed numbers. The tier defaults above are conservative starting points based on guidance from Z.AI support (April 2026):

Limits are aggregate across all models (not per-model)
Advanced models (GLM-5 family) consume 2-3x quota per prompt due to reasoning tokens, so effective concurrency is lower with those models
Recommended: exponential backoff on 429 errors (already implemented in provider retry logic)

Ref: https://docs.z.ai/devpack/usage-policy

Test Plan

uv run python -m pytest tests/unit/test_providers/test_zai_provider.py -v
# 32 passed (22 existing + 10 new tier tests)

- Add litellm to interactive provider selection menu - Support LITELLM_BASE_URL for local proxy deployments (no API key required) - Auto-add openai/ prefix when using api_base for proper LiteLLM routing - Add dummy API key for local proxies (OpenAI SDK requirement) - Add validation and tests for litellm provider configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add first-class support for Z.AI with OpenAI-compatible API. - New ZAIProvider with thinking disabled by default for GLM-5 family - Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go) - Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING - Rate limit defaults and auto-detection in CLI helpers Closes repowise-dev#68

Add ZAI_TIER environment variable support for plan-aware concurrency. Users can set ZAI_TIER=lite|pro|max to get appropriate rate limits derived from Z.AI support guidance. Changes: - rate_limiter.py: Add ZAI_TIER_DEFAULTS (lite/pro/max configs), update provider default to conservative 10 RPM/50k TPM - zai.py: Add tier parameter, tier takes precedence over registry default limiter. Bump retry budget to 5 retries / 30s max wait. - helpers.py: Read ZAI_TIER env var in both explicit and auto-detect provider resolution paths - tests: 10 new tests covering tier creation, precedence, case insensitivity, invalid tier handling, and edge cases Ref: https://docs.z.ai/devpack/usage-policy Related: repowise-dev#68

Wire Z.AI provider into the BaseProvider tier framework (from PR #NN). Changes: - Define RATE_LIMIT_TIERS on ZAIProvider with Lite/Pro/Max configs derived from Z.AI support guidance (April 2026) - Use resolve_rate_limiter() in constructor (tier > explicit > none) - Add ZAI_TIER env var support in CLI helpers - Add ZAI_TIER_DEFAULTS to rate_limiter.py for reference - Update PROVIDER_DEFAULTS['zai'] to conservative Lite-tier default - Bump retry budget: 5 retries / 30s max wait (from 3/4s) for Z.AI load-shedding tolerance - Add tier parameter to constructor and docstring Rate limit context: - Z.AI concurrency limits are aggregate, dynamic, and load-dependent - Advanced models (GLM-5 family) consume 2-3x quota per prompt - Conservative defaults: Lite 10 RPM, Pro 30 RPM, Max 60 RPM - Ref: https://docs.z.ai/devpack/usage-policy Depends on: feat/generic-tier-framework Supersedes: repowise-dev#80 (deprecates monolithic PR in favor of layered approach) Ref: repowise-dev#68

Societus · 2026-04-14T02:11:42Z

Superseded by layered PR stack

This monolithic PR has been split into two focused PRs that build on each other:

feat: add generic tier-aware rate limiting framework #82 — Generic tier-aware rate limiting framework (provider-agnostic foundation in BaseProvider)
feat(zai): adopt tier framework for plan-aware rate limiting #83 — Z.AI adopts the tier framework (Z.AI-specific tier configs, env vars, tests)

The layered approach is cleaner because:

The generic framework (feat: add generic tier-aware rate limiting framework #82) can be reviewed and merged independently — it adds zero Z.AI-specific code
Z.AI adoption (feat(zai): adopt tier framework for plan-aware rate limiting #83) is a thin layer that uses the framework, making the integration obvious
Future providers (MiniMax, etc.) get the framework for free without duplicating tier resolution logic

Both PRs are ready for review with passing tests.

vinit13792 and others added 4 commits April 13, 2026 12:29

fix(litellm): add inline comment for sk-dummy to avoid secret scanner…

27f6770

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Societus requested review from RaghavChamadiya and swati510 as code owners April 13, 2026 19:43

Societus mentioned this pull request Apr 13, 2026

feat: add Z.AI (Zhipu AI) provider support #74

Open

3 tasks

Societus mentioned this pull request Apr 14, 2026

feat(zai): adopt tier framework for plan-aware rate limiting #83

Open

Societus closed this Apr 14, 2026

Societus mentioned this pull request Apr 14, 2026

Feature: Add Z.AI (Zhipu AI) provider support #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(zai): add Z.AI provider with tier-aware rate limiting#80

feat(zai): add Z.AI provider with tier-aware rate limiting#80
Societus wants to merge 4 commits intorepowise-dev:mainfrom
Societus:feat/zai-plan-concurrency

Societus commented Apr 13, 2026

Uh oh!

Societus commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Societus commented Apr 13, 2026

Summary

Changes

Z.AI Provider (zai.py)

Rate Limiter (rate_limiter.py)

CLI Helpers (helpers.py)

Tests (10 new)

Configuration

Rate Limit Context

Related

Test Plan

Uh oh!

Societus commented Apr 14, 2026

Superseded by layered PR stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Z.AI Provider (`zai.py`)

Rate Limiter (`rate_limiter.py`)

CLI Helpers (`helpers.py`)