-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Docs] Add developer doc about CI failures #18782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
37c1da1
[Docs] Add developer doc about CI failures
russellb 9fcf58b
Update docs/contributing/ci-failures.md
russellb 11948f8
Update docs/contributing/ci-failures.md
russellb d3daf8a
Update docs/contributing/ci-failures.md
russellb 7115894
Update docs/contributing/ci-failures.md
russellb 7af642f
Update docs/contributing/ci-failures.md
russellb 3d2d2b3
Update docs/contributing/ci-failures.md
russellb 1415475
Update docs/contributing/ci-failures.md
russellb 5736245
remove extra blank line
russellb 11b61e1
Fix list numbering
russellb 18e2b08
fix numbered list problems
russellb 1cb6448
Apply suggestions from code review
russellb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# CI Failures | ||
|
||
What should I do when a CI job fails on my PR, but I don't think my PR caused | ||
the failure? | ||
|
||
- Check the dashboard of current CI test failures: | ||
👉 [CI Failures Dashboard](https://github.com/orgs/vllm-project/projects/20) | ||
|
||
- If your failure **is already listed**, it's likely unrelated to your PR. | ||
Help fixing it is always welcome! | ||
- Leave comments with links to additional instances of the failure. | ||
- React with a 👍 to signal how many are affected. | ||
|
||
- If your failure **is not listed**, you should **file an issue**. | ||
|
||
## Filing a CI Test Failure Issue | ||
|
||
- **File a bug report:** | ||
👉 [New CI Failure Report](https://github.com/vllm-project/vllm/issues/new?template=450-ci-failure.yml) | ||
|
||
- **Use this title format:** | ||
|
||
``` | ||
[CI Failure]: failing-test-job - regex/matching/failing:test | ||
``` | ||
|
||
- **For the environment field:** | ||
|
||
``` | ||
Still failing on main as of commit abcdef123 | ||
``` | ||
|
||
- **In the description, include failing tests:** | ||
|
||
``` | ||
FAILED failing/test.py:failing_test1 - Failure description | ||
FAILED failing/test.py:failing_test2 - Failure description | ||
https://github.com/orgs/vllm-project/projects/20 | ||
https://github.com/vllm-project/vllm/issues/new?template=400-bug-report.yml | ||
FAILED failing/test.py:failing_test3 - Failure description | ||
``` | ||
|
||
- **Attach logs** (collapsible section example): | ||
<details> | ||
<summary>Logs:</summary> | ||
|
||
```text | ||
ERROR 05-20 03:26:38 [dump_input.py:68] Dumping input data | ||
--- Logging error --- | ||
Traceback (most recent call last): | ||
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model | ||
return self.model_executor.execute_model(scheduler_output) | ||
... | ||
FAILED failing/test.py:failing_test1 - Failure description | ||
FAILED failing/test.py:failing_test2 - Failure description | ||
FAILED failing/test.py:failing_test3 - Failure description | ||
``` | ||
|
||
</details> | ||
|
||
## Logs Wrangling | ||
|
||
Download the full log file from Buildkite locally. | ||
|
||
Strip timestamps and colorization: | ||
|
||
```bash | ||
# Strip timestamps | ||
sed -i 's/^\[[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}Z\] //' ci.log | ||
|
||
# Strip colorization | ||
sed -i -r 's/\x1B\[[0-9;]*[mK]//g' ci.log | ||
``` | ||
|
||
Use a tool for quick copy-pasting: | ||
|
||
```bash | ||
tail -525 ci_build.log | wl-copy | ||
``` | ||
|
||
## Investigating a CI Test Failure | ||
|
||
1. Go to 👉 [Buildkite main branch](https://buildkite.com/vllm/ci/builds?branch=main) | ||
2. Bisect to find the first build that shows the issue. | ||
3. Add your findings to the GitHub issue. | ||
4. If you find a strong candidate PR, mention it in the issue and ping contributors. | ||
|
||
## Reproducing a Failure | ||
|
||
CI test failures may be flaky. Use a bash loop to run repeatedly: | ||
|
||
```bash | ||
COUNT=1; while pytest -sv tests/v1/engine/test_engine_core_client.py::test_kv_cache_events[True-tcp]; do | ||
COUNT=$[$COUNT + 1]; echo "RUN NUMBER ${COUNT}"; | ||
done | ||
``` | ||
|
||
## Submitting a PR | ||
|
||
If you submit a PR to fix a CI failure: | ||
|
||
- Link the PR to the issue: | ||
Add `Closes #12345` to the PR description. | ||
- Add the `ci-failure` label: | ||
This helps track it in the [CI Failures GitHub Project](https://github.com/orgs/vllm-project/projects/20). | ||
|
||
## Other Resources | ||
|
||
- 🔍 [Test Reliability on `main`](https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main&order=ASC&sort_by=reliability) | ||
- 🧪 [Latest Buildkite CI Runs](https://buildkite.com/vllm/ci/builds?branch=main) | ||
|
||
## Daily Triage | ||
|
||
Use [Buildkite analytics (2-day view)](https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main&period=2days) to: | ||
|
||
- Identify recent test failures **on `main`**. | ||
- Exclude legitimate test failures on PRs. | ||
- (Optional) Ignore tests with 0% reliability. | ||
|
||
Compare to the [CI Failures Dashboard](https://github.com/orgs/vllm-project/projects/20). |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://squidfunk.github.io/mkdocs-material/reference/admonitions/#collapsible-blocks