Skip to content

build: fixes to enable vLLM slim runtime image #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 29, 2025

Conversation

nv-tusharma
Copy link
Contributor

@nv-tusharma nv-tusharma commented May 13, 2025

Overview:

OPS-41: This PR provides a minimum vllm dynamo runtime image which contains all the necessary dependencies to run dynamo CLI with vLLM backend. This includes support for:

  • NATS & ETCD
  • NIXL with UCX plugin support
  • All dynamo CLI options (dynamo-run, dynamo serve, dynamo deploy, etc).

The resulting runtime container size is around 12.2 GB. The current vLLM devel image size is approximately 39.5 GB. Once this PR is approved and merged, The next steps would be:

  1. Enabling this image in nightly CI build
  2. Provide vllm image to dynamo deploy team to use for their deployments

Details:

container/Dockerfile.vllm

  • Build nixl wheel in wheelbuilder stage with ucx backend
  • Copy nixl and ucx artifacts into runtime stage and install nixl via uv pip
  • Copy bindings from ci_minimum image into runtime images
  • Copy nats-server & etcd into runtime stage
  • Install build-essential along with python3-dev since this is an indirect dependency for vllm required for disaggregated serving
  • Install the common container requirements.txt: https://github.com/ai-dynamo/dynamo/blob/main/container/deps/requirements.txt

Steps for testing

  1. ./container/build.sh --target runtime
  2. ./container/run.sh -it --image dynamo:latest-vllm-runtime
  3. Ran the aggregated and disaggregated examples from this folder: https://github.com/ai-dynamo/dynamo/tree/main/examples/llm

Where should the reviewer start?

container/Dockerfile.vllm

Summary by CodeRabbit

  • Chores
    • Improved the Docker image build process by separating build and installation steps for Python modules.
    • Added necessary system dependencies and runtime components for better performance.
    • Streamlined Python environment setup and package installation for enhanced reliability and maintainability.

Copy link
Contributor

coderabbitai bot commented May 28, 2025

"""

Walkthrough

The Dockerfile for the vLLM container has been updated to separate the build and installation steps for the NIXL Python module using wheel artifacts. Additional system dependencies and runtime binaries are included, and the Python environment setup is streamlined by adjusting how dependencies and executables are installed and managed.

Changes

File(s) Change Summary
container/Dockerfile.vllm Refactored NIXL build/install to build wheels first and then install, added system dependencies, copied runtime binaries and libs, streamlined Python venv setup and package installation

Poem

In Docker’s warren, wheels now spin,
NIXL builds tidy, let the fun begin!
With binaries and libs, dependencies align,
Python’s venv sparkles, everything’s fine.
A rabbit hops by, gives a wink and a cheer—
“Your containers are faster, more nimble this year!” 🐇
"""


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cdc46b and 5bd4957.

📒 Files selected for processing (1)
  • container/Dockerfile.vllm (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
container/Dockerfile.vllm (1)

491-494: Reevaluate build-essential and python3-dev in the runtime image
Pulling in full build toolchains inflates the runtime container by hundreds of megabytes. Since all C/C++ components (UCX, NIXL) are precompiled and shipped as wheels, these packages may no longer be needed. Consider removing them to further slim the image.

🧹 Nitpick comments (8)
container/Dockerfile.vllm (8)

158-161: Validate uv build command quoting and duplication
The multi‐line uv build invocation for ARM64 uses nested double quotes and a trailing semicolon, which can be fragile in shell parsing. Consider unifying the two branches to avoid duplication and simplify quoting, for example:

RUN cd /opt/nixl && \
    uv build . --out-dir /workspace/wheels/nixl \
    $( [ "$ARCH" = "arm64" ] && echo "--config-settings='setup-args=-Dgds_path=/usr/local/cuda/targets/sbsa-linux'" )  

This removes the inner double‐quotes, eliminates duplicate commands, and makes the conditional flag injection clearer.


164-166: Consider relocating the NIXL wheel installation
The RUN uv pip install /workspace/wheels/nixl/*.whl step in the base image duplicates wheel installation logic between stages. Moving this into the wheel_builder stage (as noted by the TODO) will speed up the runtime build and tighten layer caching.


496-498: Prune unnecessary files from /opt/dynamo/bindings
The copy from ci_minimum brings in wheels, headers, and other artifacts under /opt/dynamo/bindings. At runtime you only need the C API shared libraries (.so). Excluding include directories and wheels will reduce image size.


504-507: Optimize UCX and NIXL artifact copy
Currently, you copy the entire source trees at /usr/local/ucx and /usr/local/nixl, including headers and docs. For runtime you only need the .so files under lib/* and plugin directories. Restricting the copy to libraries will significantly shrink the final image.


508-511: Ensure LD_LIBRARY_PATH doesn’t override critical paths
By redefining LD_LIBRARY_PATH you may inadvertently mask CUDA or system libraries. It’s safer to append your custom paths:

ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu:/usr/local/nixl/lib/${ARCH_ALT}-linux-gnu/plugins:/usr/local/ucx/lib

This preserves earlier defaults.


514-515: Remove redundant venv activation
You’ve already prepended the venv’s bin to PATH (line 489), so sourcing the activate script in .bashrc isn’t needed and could alter shell startup behavior.


518-520: Add --no-cache-dir to pip install
To prevent pip from caching packages in the venv and reduce layer size, include the --no-cache-dir flag:

uv pip install --no-cache-dir --requirement /tmp/requirements.txt

523-530: Verify wheelhouse installation and symlinks

  1. Confirm that the package name ai-dynamo[vllm] matches the wheel metadata (underscores vs. hyphens) to avoid install failures.
  2. Instead of symlinking all venv binaries into /usr/local/bin, consider targeting only the Dynamo CLI executables to prevent shadowing system tools.
ln -sf $VIRTUAL_ENV/bin/dynamo* /usr/local/bin/
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c12f61a and 77cc9ff.

📒 Files selected for processing (1)
  • container/Dockerfile.vllm (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
container/Dockerfile.vllm (1)

499-503:

❓ Verification inconclusive

Verify nats-server and etcd binary dependencies
Copying these binaries directly may omit shared library dependencies required at runtime. Please run something like:

to confirm no missing libraries in the slim image.


🏁 Script executed:

ldd /usr/bin/nats-server
ldd /usr/local/bin/etcd/etcd

Length of output: 167


Verify nats-server and etcd runtime dependencies inside the built image
The host sandbox can’t locate the binaries, so please run these commands in your Docker image to confirm no missing shared libraries:

docker run --rm -it <your-image> ldd /usr/bin/nats-server
docker run --rm -it <your-image> ldd /usr/local/bin/etcd/etcd

Ensure that no “not found” entries appear.

@nv-tusharma nv-tusharma merged commit 93ca9df into main May 29, 2025
10 checks passed
@nv-tusharma nv-tusharma deleted the tusharma/slim-runtime-vllm-build branch May 29, 2025 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants