Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Active Stack as part of configuration YAML #2370

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from

Conversation

kabinja
Copy link

@kabinja kabinja commented Jan 29, 2024

Describe changes

Issue #2261: Add Active Stack as part of configuration YAML

I added the field active_stack to PipelineRunConfiguration to be able to set up an active stack during the duration of the run. I also added a stack_context manager to allow resetting the stack at the end of the execution of the pipeline. This was done to reset the initial active stack at the end of the execution of the pipeline if one is defined by active_stack in the configuration YAML.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • [see comments] If my change requires a change to docs, I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • [N/A] If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

Comments

I did not find the section where the configuration file usage is referenced in the documentation.

Summary by CodeRabbit

  • New Features
    • Introduced the ability to specify an active stack in the pipeline run configuration, enhancing flexibility in pipeline execution.
  • Improvements
    • Added a context handler for managing stack contexts, allowing for more dynamic stack management during pipeline operations.
  • Tests
    • Implemented a new integration test to verify the functionality of running pipelines with different stacks specified in the configuration file.

Copy link
Contributor

coderabbitai bot commented Jan 29, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The recent updates introduce the capability to specify an active stack for individual pipeline runs within ZenML. This is facilitated through a new attribute in the PipelineRunConfiguration class and enhanced pipeline functionality to utilize a specified stack context during build and run operations. A context manager ensures the original stack is restored after execution. Integration tests are expanded to verify this new feature, ensuring pipelines can run with different stacks as defined in their configuration files.

Changes

File Path Change Summary
.../config/pipeline_run_configuration.py Added active_stack attribute of type Optional[str] to PipelineRunConfiguration class.
.../new/pipelines/pipeline.py Imported stack_context, updated build and _run methods to use stack_context, and added _update_stack_from_config method.
.../stack/utils.py Added Type import and stack_context class for handling stack context.
tests/integration/functional/cli/test_pipeline.py Added test_pipeline_run_with_different_stack_in_config_file to verify running pipelines with a specified stack in the configuration.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@htahir1
Copy link
Contributor

htahir1 commented Jan 29, 2024

@coderabbitai review

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: .coderabbit.yaml

Commits Files that changed from the base of the PR and between 9f31c6b and ac820e5.
Files selected for processing (4)
  • src/zenml/config/pipeline_run_configuration.py (1 hunks)
  • src/zenml/new/pipelines/pipeline.py (5 hunks)
  • src/zenml/stack/utils.py (2 hunks)
  • tests/integration/functional/cli/test_pipeline.py (1 hunks)
Additional comments: 5
src/zenml/config/pipeline_run_configuration.py (1)
  • 32-32: The addition of active_stack: Optional[str] = None to the PipelineRunConfiguration class is correctly implemented and aligns with the PR's objective to allow specifying an active stack for pipeline runs. This change introduces flexibility in pipeline execution environments without mandating an active stack for every run.
src/zenml/new/pipelines/pipeline.py (4)
  • 76-76: The import of stack_context is correctly added to support the new feature of specifying an active stack within a pipeline's configuration YAML.
  • 538-540: The use of stack_context in the build method is correctly implemented to ensure the temporary activation of the specified stack during the pipeline build process.
  • 626-628: The use of stack_context in the _run method is correctly implemented to ensure the temporary activation of the specified stack during the pipeline run process.
  • 1159-1160: The addition of the _update_stack_from_config method is correctly implemented to update the active stack from the pipeline run configuration. This method is crucial for the new feature allowing the specification of an active stack within a pipeline's configuration YAML.

src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/stack/utils.py Outdated Show resolved Hide resolved
Comment on lines 371 to 402
def test_pipeline_run_with_different_stack_in_config_file(
clean_client: "Client", tmp_path
):
"""Tests that the run command works with a run config file with an active stack defined."""
runner = CliRunner()
run_command = cli.commands["pipeline"].commands["run"]

pipeline_id = pipeline_instance.register().id

components = {
key: components[0].id
for key, components in Client().active_stack_model.components.items()
}
new_stack = Client().create_stack(name="new", components=components)

config_path = tmp_path / "config.yaml"
run_config = PipelineRunConfiguration(
run_name="custom_run_name", active_stack=str(new_stack.id)
)
config_path.write_text(run_config.yaml())

result = runner.invoke(
run_command, [pipeline_instance.name, "--config", str(config_path)]
)
assert result.exit_code == 0

runs = Client().list_pipeline_runs(pipeline_id=pipeline_id)
assert len(runs) == 1
assert runs[0].name == "custom_run_name"
assert runs[0].stack.id == new_stack.id
assert Client().active_stack.id != new_stack.id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_pipeline_run_with_different_stack_in_config_file is well-implemented and effectively verifies that the pipeline run command correctly handles a different active stack specified in the run configuration file. This test ensures the new feature works as intended and is a valuable addition to the test suite. However, it would be beneficial to include assertions to verify that the active stack was correctly set during the pipeline run, in addition to checking the post-run conditions.

Consider enhancing the test to verify that the active stack is correctly set during the pipeline run. This could involve querying the active stack within the pipeline execution context and asserting that it matches the expected stack.

@strickvl strickvl added the enhancement New feature or request label Jan 31, 2024
@strickvl strickvl marked this pull request as draft January 31, 2024 15:21
@strickvl strickvl marked this pull request as ready for review January 31, 2024 15:21
src/zenml/new/pipelines/pipeline.py Outdated Show resolved Hide resolved
src/zenml/stack/utils.py Outdated Show resolved Hide resolved
src/zenml/stack/utils.py Outdated Show resolved Hide resolved
kabinja and others added 2 commits February 2, 2024 13:34
fix typo in docstring

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
add missing return type

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great so far but needs changes in the docs, specifically somewhere here: https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#breaking-configuration-down

@kabinja
Copy link
Author

kabinja commented Feb 4, 2024

This is great so far but needs changes in the docs, specifically somewhere here: https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#breaking-configuration-down

I added the missing documentation taking the build field as a reference.

kabinja and others added 2 commits February 5, 2024 14:32
…ure-steps-pipelines.md

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
…ure-steps-pipelines.md

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
@strickvl strickvl removed the request for review from avishniakov February 14, 2024 10:33
Copy link
Contributor

@bcdurak bcdurak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution, it's much appreciated. :)

Aside from a small bug in the __exit__ method, your implementation should work as intended. I still left a few comments to improve it however, I hope it is helpful. Let me know if you have any questions!

src/zenml/config/pipeline_run_configuration.py Outdated Show resolved Hide resolved
src/zenml/stack/utils.py Outdated Show resolved Hide resolved
src/zenml/stack/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bcdurak bcdurak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kabinja, I see that you already fixed the issue with the __exit__ method, however, could you please use the context manager that I mentioned in my comment above?

@kabinja
Copy link
Author

kabinja commented Feb 26, 2024

Hey @kabinja, I see that you already fixed the issue with the __exit__ method, however, could you please use the context manager that I mentioned in my comment above?

Sorry, I did not have time to go through all the comments this weekend. I changed to use the existing context manager. Would it make sense to move the context manager to the stack.utils module to avoid having issues with cyclic dependencies? I was also wondering why not using enter and exit instead of the yield in a try-catch block?

@bcdurak
Copy link
Contributor

bcdurak commented Mar 13, 2024

Hey @kabinja, thank you for applying the changes.

Yes, it would be great if you could move the context manager from the current file to a destination such as src/zenml/utils/stack_utils.py.

As for the __enter__ and __exit__ methods vs. the decorator, both approaches serve the same purpose. When it is a relatively simple process, IMO the decorator is a bit more readable, whereas for complicated cases I usually prefer to implement a context manager using the __enter__, __exit__ methods.

@strickvl strickvl requested a review from bcdurak May 3, 2024 05:37
@safoinme safoinme removed their request for review May 12, 2024 13:36
@htahir1
Copy link
Contributor

htahir1 commented May 23, 2024

@bcdurak Should we maybe move on this? @kabinja just wanted send a soft ping to make sure we don't lose this thread

@htahir1
Copy link
Contributor

htahir1 commented Jul 9, 2024

@kabinja im so sorry for the delay here. We talked internally about this PR and it still needs some work:

  • it should not be called active_stack but stack
  • It doesn't consider running from the server where we need to check this isn't set
  • You enter temporary_active_stack with None, which means it does nothing. So the code just changes the stack, not temporarily for the duration of the pipeline run

What this feature should be is probably this: when a stack is specified in the yaml, it uses that stack for the pipeline run, but doesny modify your "active stack" that you've set locally:

  • I have the default stack active locally (-> zenml stack get will output default)
  • I run a pipeline, and in the config yaml I specify stack: kubeflow. This pipeline should run on the kubeflow stack
  • After the pipeline run, zenml stack get should still be default IMO. This is not what his PR does

Can you take a look at this and see if you can work on it? If not feel free to close it and we an pick it up again soon. Sorry for the staleness on our side!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants