-
Notifications
You must be signed in to change notification settings - Fork 511
Improve error message for duplicate pipeline run names #3701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Documentation Link Check Results✅ Absolute links check passed |
b38ae4f
to
32b4a7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the user experience by providing clearer guidance when a pipeline run name conflict occurs, along with tests and documentation to support the change.
- Enhanced error handling in
create_placeholder_run
to catch duplicate run-name errors and surface a helpful, actionable message. - Added unit tests to verify the new error message and ensure other
EntityExistsError
cases remain unchanged. - Updated docs to warn about run-name uniqueness and suggest best practices.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/zenml/pipelines/run_utils.py | Improved duplicate run-name error detection and messaging |
tests/unit/pipelines/test_run_utils.py | Added tests for duplicate run-name error and non-duplicate behavior |
docs/book/how-to/steps-pipelines/yaml_configuration.md | Warn about unique run names and provide guidance in YAML examples |
run_alerter_tests.sh | New script to run alerter tests (scope seems unrelated) |
Comments suppressed due to low confidence (1)
run_alerter_tests.sh:1
- [nitpick] This script for running alerter tests appears unrelated to the pipeline run-name improvements. Consider moving it to a separate PR or isolating it under a more relevant feature grouping to keep this change focused.
#!/bin/bash
When users run a pipeline with a fixed `run_name` in their config.yaml and then rerun the same pipeline, they would get a confusing database error about entity existence. This change catches both EntityExistsError and RuntimeError (with IntegrityError) specifically for duplicate run names and provides a much more helpful error message. ## Changes - Add improved error handling in `create_placeholder_run()` to catch duplicate run name errors (both EntityExistsError and raw SQL IntegrityError) - Provide actionable guidance with 3 specific solutions: 1. Change the run_name to a unique value 2. Use dynamic placeholders like `run_name: "my_run_{date}_{time}"` 3. Remove the run_name to auto-generate unique names - Add comprehensive unit tests to verify the improved error message - Update documentation in yaml_configuration.md to warn about run name uniqueness ## User Experience Instead of seeing confusing database errors, users now get: ``` Pipeline run name 'my_run_name' already exists in this project. Each pipeline run must have a unique name. To fix this, you can: 1. Change the 'run_name' in your config file to a unique value 2. Use a dynamic run name with placeholders like: run_name: "my_run_name_{date}_{time}" 3. Remove the 'run_name' from your config to auto-generate unique names For more information on run naming, see: https://docs.zenml.io/concepts/steps_and_pipelines/yaml_configuration#run-name ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
32b4a7a
to
c28564c
Compare
As suggested in PR review, this commit adds clearer YAML comments to the run_name placeholder examples to make it more obvious what each example demonstrates.
- Use TYPE_CHECKING to handle the optional sqlalchemy import properly - Rename to SQLIntegrityError to avoid confusion with other exceptions - This ensures mypy doesn't complain about assigning None to a type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances error handling for duplicate pipeline run names by catching both ZenML and raw SQL errors, improving the user-facing message, adding unit tests, and updating the documentation with guidance on unique run names.
- Catch and re-raise
EntityExistsError
with a friendlier, actionable message when run names collide - Detect raw SQL integrity errors (duplicate entry) and convert them to the same improved error
- Add unit tests covering both error pathways and update docs with run name best practices
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
tests/unit/pipelines/test_run_utils.py | New tests for duplicate-name errors and preservation of other errors |
src/zenml/pipelines/run_utils.py | Catch both EntityExistsError and raw SQL duplicates; build improved error message |
docs/book/how-to/steps-pipelines/yaml_configuration.md | Warning section added explaining unique run name guidelines |
- Change broad Exception catch to specific RuntimeError - Add parentheses for clarity in boolean logic - Align documentation wording with error message 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
src/zenml/pipelines/run_utils.py
Outdated
run, _ = Client().zen_store.get_or_create_run(run_request) | ||
return run | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This should be handled in the SQLZenStore, not in this random place (which is only one occurence where a run is created).
- Specifying the run name in a config file is not the only one way to do it, the message can simply be generic and talk about configuration instead of files.
- Moved error handling from run_utils.py to sql_zen_store.py where it architecturally belongs - Database-specific error handling now stays in the database layer - Made error message more generic (removed specific mention of 'config file') - Simplified run_utils.py by removing 40+ lines of error handling code - Updated tests to reflect the new error handling location - All code paths that create runs now benefit from improved error messages 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove EntityExistsError from Raises section since this function no longer explicitly raises exceptions - they are now handled in SQLZenStore. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
# We have to rollback the failed session first in order to | ||
# continue using it | ||
session.rollback() | ||
|
||
# Check if this is a duplicate run name error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this error should already be caught as part of the method call below your changes, which verifies the name uniqueness and raises an EntityExistsError
already.
I could reproduce the weird error message that you were getting though, but it is as part of the autoflush behaviour of SQLAlchemy, and does not happen as part of this commit()
I think.
I managed to solve it by wrapping the code inside the if pipeline_run.logs is not None
with a contextmanager as follows:
if pipeline_run.logs is not None:
with session.no_autoflush:
...
try:
...
If you think the error message is still to unclear after this change, I think the best way would be to add an option for a custom error message in the verify_name_uniqueness
method.
|
||
|
||
@patch("zenml.pipelines.run_utils.Client") | ||
def test_create_placeholder_run_duplicate_name_error(mock_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to really test anything, other than that the mocking library works. If anything, I think an integration test that actually runs a pipeline twice with the same name is the best test for this.
Summary
run_name
Problem
When users run a pipeline with a duplicate run name (whether from a config file, programmatically, or any other method), they get a confusing database error. The error can come in two forms:
Solution
This PR catches IntegrityError in SQLZenStore's
_create_run
method and provides a much more helpful error message with actionable solutions.Before (Raw SQL Error)
After (User-Friendly Error)
Changes Made
_create_run()
method to catch IntegrityError and provide user-friendly messagesrun_utils.py
to SQLZenStore for better architecture (database errors handled at database layer)Test Plan
🤖 Generated with Claude Code