[ci] Add Ray compatibility check informational CI job#34672
[ci] Add Ray compatibility check informational CI job#34672jeffreywang-anyscale wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
There was a problem hiding this comment.
Code Review
The pull request introduces a useful CI job to proactively detect dependency conflicts between vLLM and Ray. This will help prevent integration issues that are currently only discovered during Ray's upgrade process. The implementation uses a non-blocking approach with Buildkite annotations and Slack notifications, which is appropriate for an informational check. I have identified a few high-severity issues related to the robustness of the bash script, specifically regarding error handling and safe JSON construction for notifications.
| # | ||
| # See: https://github.com/vllm-project/vllm/issues/33599 | ||
|
|
||
| set -o pipefail |
There was a problem hiding this comment.
The script should use set -e to ensure it terminates immediately if any setup command fails (e.g., curl failing to download a lock file or sed failing to process it). Without set -e, the script might continue with incomplete data, potentially leading to false positives where the compatibility check appears to pass because the constraints file was empty or missing.
| set -o pipefail | |
| set -eo pipefail |
| curl -s -X POST "$RAY_COMPAT_SLACK_WEBHOOK_URL" \ | ||
| -H 'Content-type: application/json' \ | ||
| -d "{ | ||
| \"text\": \":warning: Ray Dependency Compatibility Check Failed\", | ||
| \"blocks\": [ | ||
| { | ||
| \"type\": \"section\", | ||
| \"text\": { | ||
| \"type\": \"mrkdwn\", | ||
| \"text\": \"*:warning: Ray Dependency Compatibility Check Failed*\nPR #${BUILDKITE_PULL_REQUEST:-N/A} on branch \`${BUILDKITE_BRANCH:-unknown}\` introduces dependencies that conflict with Ray's lock file(s): ${FAILED_LOCKS[*]}\n<${BUILDKITE_BUILD_URL:-#}|View Build>\" | ||
| } | ||
| } | ||
| ] | ||
| }" |
There was a problem hiding this comment.
Constructing a JSON payload by manually expanding environment variables inside a double-quoted string is fragile and insecure. If variables like BUILDKITE_BRANCH contain double quotes or other special characters, the resulting JSON will be malformed, causing the Slack notification to fail. It is safer to use a tool like jq or a small Python snippet to generate the JSON payload correctly.
# Construct JSON payload safely using Python to avoid malformed JSON from special characters
PAYLOAD=$(python3 -c '
import json, sys, os
failed = sys.argv[1]
pr = os.getenv("BUILDKITE_PULL_REQUEST", "N/A")
branch = os.getenv("BUILDKITE_BRANCH", "unknown")
url = os.getenv("BUILDKITE_BUILD_URL", "#")
data = {
"text": ":warning: Ray Dependency Compatibility Check Failed",
"blocks": [{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*:warning: Ray Dependency Compatibility Check Failed*\nPR #{pr} on branch `{branch}` introduces dependencies that conflict with Ray'\''s lock file(s): {failed}\n<{url}|View Build>"
}
}]
}
print(json.dumps(data))
' "${FAILED_LOCKS[*]}")
curl -s -X POST "$RAY_COMPAT_SLACK_WEBHOOK_URL" \
-H 'Content-type: application/json' \
-d "$PAYLOAD"Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Purpose
Ray installs vLLM via pip install 'vllm[audio]' constrained by its lock files. When a vLLM PR bumps or tightens a dependency (e.g. protobuf>=5.29.6), it can silently break Ray's ability to install vLLM in its environment. Today these conflicts are only discovered when the Ray team tries to upgrade, potentially blocking release timelines.
This PR adds a non-blocking CI job that runs
pip install --dry-runof the built vLLM wheel against two Ray lock files (ray_py311_cu128.lockandrayllm_test_py311_cu128.lock). On conflict it surfaces a Buildkite annotation and sends a notification to Anyscale's internal slack channel. The job usessoft_fail: trueso it never blocks the pipeline.RFC: #33599
Test Plan & Result
protobuf>=5.29.6vsprotobuf==5.29.5conflict against both lock files.vllm==pin inrayllm_test_py311_cu128.lockis stripped to avoid false positivesTODO
RAY_COMPAT_SLACK_WEBHOOK_URLas a pipeline secret after the PR merges.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.