Open
Description
Currently, Pipeline.draw
and Pipeline.show
call the mermaid.ink server by default.
(Users can also configure a custom Mermaid server using Docker.)
Recent problems
Pipeline.draw
has been experiencing frequent timeouts.
Over the past month, Mermaid servers have faced reliability issues, likely due to high traffic.
See the following issues: jihchi/mermaid.ink#491, jihchi/mermaid.ink#498.
We recently introduced changes to pipeline drawing (#8767, #8799), but these do not appear to be the cause of the timeouts.
These failures impact users and our CI pipeline, causing integration tests to fail and slowing down development.
Affected tests
- integration tests in haystack/test/core/pipeline/test_draw.py
- nightly e2e tests (these have not been failing in the last few days)
- tutorials tests
Action taken/in progress
- Configurable timeout in
Pipeline.draw
Pipeline drawing: exposetimeout
parameter and increase the default #8967 - Retry mechanism in
Pipeline.draw
Add retries parameter to pipeline.draw() to help mitigate mermaid server issues #9045 (uncertain if this is effective for CI due to repeated calls in a short timeframe.)
Possible next steps
Skip non-critical integration tests that frequently faildone in test: skip/remove somePipeline.draw
integration tests #9108removedone in test: temporarily skipPipeline.draw
from e2e tests if they start to fail againtest_to_mermaid_image
integration test #9121- reflect on long-term solutions (hosting our own Mermaid server, find a python visualization library, ...)