Skip to content

Conversation

@yeshsurya
Copy link
Contributor

@yeshsurya yeshsurya commented Jan 6, 2026

This pull request introduces significant improvements to the vLLM-based rollout server infrastructure, focusing on better LoRA (Low-Rank Adaptation) support, compatibility with newer vLLM versions, enhanced IPv6 handling, and improved modularity. The main updates include introducing a custom LoRA request and hijack logic, updating dependencies, refactoring the vLLM server class hierarchy, and ensuring compatibility with vLLM 0.13.0+ and Ray 2.53.0.

Key changes:

LoRA (Low-Rank Adaptation) Support and vLLM Integration

  • Added a new utils.py module with a custom TensorLoRARequest and VLLMHijack class to allow loading LoRA adapters directly from tensors, overcoming vLLM's limitation of only supporting file-based LoRA adapters. This enables more flexible and efficient LoRA synchronization between actor models.
  • Updated the vLLM HTTP server logic to inject LoRA requests when the LoRA adapter is loaded, and to set LoRA-related arguments (enable_lora, max_loras, max_lora_rank) based on the model configuration. [1] [2]

Compatibility and Dependency Updates

  • Upgraded Ray to version 2.53.0 in requirements.txt for improved compatibility and performance.
  • Updated the Dockerfile to install specific versions of PyTorch, torchvision, torchaudio, and reordered the installation of flash-attn for CUDA 12.6 compatibility.
  • Switched from pickle to cloudpickle for serialization in vllm_async_server, improving compatibility with complex Python objects.

vLLM Server Refactor and API Adjustments

  • Refactored the vLLM HTTP server into a base class (vLLMHttpServerBase) and a Ray-remote subclass (vLLMHttpServer), improving code modularity and clarity.
  • Updated method signatures and internal logic for better type safety and to reflect that only RolloutConfig (not RewardModelConfig) is accepted in the vLLM HTTP server. [1] [2] [3]
  • Updated imports for compatibility with vLLM 0.13.0+ (e.g., splitting FlexibleArgumentParser and get_tcp_uri imports).

IPv6 and Networking Improvements

  • Improved ZeroMQ socket handling to support IPv6 addresses and updated address formatting logic throughout the server and replica classes. [1] [2]
  • Added logic to determine if an address is IPv6 and format it accordingly when setting up server addresses.

Miscellaneous Enhancements

  • Ensured that when non-blocking calls are made to the distributed executor, results are wrapped in Future objects for compatibility with vLLM 0.13.0+. [1] [2]
  • Added Prometheus model name extraction logic for improved observability.
  • Various minor improvements and bug fixes, such as logging enhancements and argument handling. [1] [2] [3]

These changes collectively improve the flexibility, maintainability, and compatibility of the rollout server infrastructure with the latest vLLM and Ray versions, while enabling advanced LoRA workflows.

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Test Results for assets-test

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 58fcdf3.

♻️ This comment has been updated with latest results.

@yeshsurya yeshsurya force-pushed the yeshwanth/torch_and_dep_upgrade_in_environment branch from a52f490 to 2aba2e3 Compare January 7, 2026 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants