You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add vLLM remote tokenizer with engine integration (#1328)
Add support for using vLLM's remote tokenizer endpoint to enable
tokenization without loading models in gateway plugins. This feature
allows the gateway to delegate tokenization to vLLM engine instances,
reducing memory usage and improving scalability.
## Key Features
- Integrate vLLM's /tokenize endpoint for remote tokenization
- Implement TokenizerPool for managing per-model tokenizer connections
- Support health checking and automatic failover to local tokenizer
- Add caching and connection pooling for performance
- Support both vLLM and other inference engines through pod label
detection
## Implementation Details
- New remote tokenizer client with retry logic and timeout handling
- TokenizerPool with concurrent access support and automatic cleanup
- Health monitoring with 5-second timeout for tokenizer endpoints
- Fallback to local character tokenizer when remote unavailable
- Prometheus metrics for monitoring tokenizer pool status
## Configuration
- AIBRIX_ENABLE_VLLM_REMOTE_TOKENIZER: Feature flag (default: false)
- AIBRIX_VLLM_TOKENIZER_ENDPOINT_TEMPLATE: Endpoint format
(default: "http://%s:8000")
- AIBRIX_TOKENIZER_HEALTH_CHECK_PERIOD: Health check interval
(default: 30s)
- AIBRIX_TOKENIZER_TTL: Unused tokenizer cleanup time (default: 5m)
- AIBRIX_MAX_TOKENIZERS_PER_POOL: Pool size limit (default: 100)
## Review Feedback Addressed
- Changed default to disabled for production safety
- Fixed race conditions in concurrent access
- Optimized lock contention with double-checked locking
- Added comprehensive test coverage including benchmarks
- Created centralized constants package for Kubernetes labels
Tested with vLLM v0.4.0 and includes backward compatibility support.
Signed-off-by: ae86zhizhi <550149470@qq.com>
0 commit comments