Skip to content

[CI Failure]: test_async_llm_dp causes hang and 3 hour timeout in CI #22902

@tdoublep

Description

@tdoublep

Name of failing test

v1/test_async_llm_dp.py

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

The test doesn't actually seem to fail in CI but it seems to hang and the test times out after 3 hours.

Locally, I cannot reproduce the hang but the test is flaky and sometimes fails (will update shortly with the error).

I have verified that if we skip this test, the Distributed Test (2 GPUs) is now passing.

📝 History of failing test

TBD

CC List.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions