Skip to content

[ci] Add Ray compatibility check informational CI job#34672

Draft
jeffreywang-anyscale wants to merge 2 commits intovllm-project:mainfrom
jeffreywang-anyscale:ray-informational-ci
Draft

[ci] Add Ray compatibility check informational CI job#34672
jeffreywang-anyscale wants to merge 2 commits intovllm-project:mainfrom
jeffreywang-anyscale:ray-informational-ci

Conversation

@jeffreywang-anyscale
Copy link
Contributor

@jeffreywang-anyscale jeffreywang-anyscale commented Feb 17, 2026

Purpose

Ray installs vLLM via pip install 'vllm[audio]' constrained by its lock files. When a vLLM PR bumps or tightens a dependency (e.g. protobuf>=5.29.6), it can silently break Ray's ability to install vLLM in its environment. Today these conflicts are only discovered when the Ray team tries to upgrade, potentially blocking release timelines.

This PR adds a non-blocking CI job that runs pip install --dry-run of the built vLLM wheel against two Ray lock files (ray_py311_cu128.lock and rayllm_test_py311_cu128.lock). On conflict it surfaces a Buildkite annotation and sends a notification to Anyscale's internal slack channel. The job uses soft_fail: true so it never blocks the pipeline.

RFC: #33599

Test Plan & Result

  • Verified locally that the check correctly detects the current protobuf>=5.29.6 vs protobuf==5.29.5 conflict against both lock files.
  • Confirmed that the vllm== pin in rayllm_test_py311_cu128.lock is stripped to avoid false positives

TODO

  • Require vLLM buildkite admin to add RAY_COMPAT_SLACK_WEBHOOK_URL as a pipeline secret after the PR merges.
  • Create slack incoming webhook. AI: @jeffreywang-anyscale

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@mergify mergify bot added the ci/build label Feb 17, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a useful CI job to proactively detect dependency conflicts between vLLM and Ray. This will help prevent integration issues that are currently only discovered during Ray's upgrade process. The implementation uses a non-blocking approach with Buildkite annotations and Slack notifications, which is appropriate for an informational check. I have identified a few high-severity issues related to the robustness of the bash script, specifically regarding error handling and safe JSON construction for notifications.

#
# See: https://github.com/vllm-project/vllm/issues/33599

set -o pipefail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script should use set -e to ensure it terminates immediately if any setup command fails (e.g., curl failing to download a lock file or sed failing to process it). Without set -e, the script might continue with incomplete data, potentially leading to false positives where the compatibility check appears to pass because the constraints file was empty or missing.

Suggested change
set -o pipefail
set -eo pipefail

Comment on lines 115 to 128
curl -s -X POST "$RAY_COMPAT_SLACK_WEBHOOK_URL" \
-H 'Content-type: application/json' \
-d "{
\"text\": \":warning: Ray Dependency Compatibility Check Failed\",
\"blocks\": [
{
\"type\": \"section\",
\"text\": {
\"type\": \"mrkdwn\",
\"text\": \"*:warning: Ray Dependency Compatibility Check Failed*\nPR #${BUILDKITE_PULL_REQUEST:-N/A} on branch \`${BUILDKITE_BRANCH:-unknown}\` introduces dependencies that conflict with Ray's lock file(s): ${FAILED_LOCKS[*]}\n<${BUILDKITE_BUILD_URL:-#}|View Build>\"
}
}
]
}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Constructing a JSON payload by manually expanding environment variables inside a double-quoted string is fragile and insecure. If variables like BUILDKITE_BRANCH contain double quotes or other special characters, the resulting JSON will be malformed, causing the Slack notification to fail. It is safer to use a tool like jq or a small Python snippet to generate the JSON payload correctly.

    # Construct JSON payload safely using Python to avoid malformed JSON from special characters
    PAYLOAD=$(python3 -c '
import json, sys, os
failed = sys.argv[1]
pr = os.getenv("BUILDKITE_PULL_REQUEST", "N/A")
branch = os.getenv("BUILDKITE_BRANCH", "unknown")
url = os.getenv("BUILDKITE_BUILD_URL", "#")
data = {
    "text": ":warning: Ray Dependency Compatibility Check Failed",
    "blocks": [{
        "type": "section",
        "text": {
            "type": "mrkdwn",
            "text": f"*:warning: Ray Dependency Compatibility Check Failed*\nPR #{pr} on branch `{branch}` introduces dependencies that conflict with Ray'\''s lock file(s): {failed}\n<{url}|View Build>"
        }
    }]
}
print(json.dumps(data))
' "${FAILED_LOCKS[*]}")

    curl -s -X POST "$RAY_COMPAT_SLACK_WEBHOOK_URL" \
        -H 'Content-type: application/json' \
        -d "$PAYLOAD"

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant