Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-based flow storage #2840

Merged
merged 24 commits into from
Jun 24, 2020
Merged

File-based flow storage #2840

merged 24 commits into from
Jun 24, 2020

Conversation

joshmeek
Copy link

@joshmeek joshmeek commented Jun 22, 2020

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • add a changelog entry in the changes/ directory (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

Note that your PR will not be reviewed unless all three boxes are checked.

What does this PR change?

Closes #2785

This is a first cut of an implementation of file based storage. There are a few new additions in this PR:

  • GitHub storage which references flows stored as .py files in github repos
  • A storage utility for extracting flow objects from .py files
  • prefect register flow CLI command which takes a .py file and registers a flow with a backend

As envisioned in this PR, the typical workflow for file based storage might look like:

Compose flow .py file where flow has GitHub storage:

flow = Flow("my-flow")
flow.storage = GitHub(repo="my/repo", path="/flows/flow.py")

Push this flow.py file to the my/repo repository under /flows/flow.py.

Call prefect register -f flow.py to register this flow with GitHub storage.

Now as long as the flow structure is the same it does not need to be reregistered if the content of the flow itself changes. All that needs to happen is the flow file in the set location is overwritten (e.g. pushed to the git repo). This means flows may now be updated easily in CI/CD processes, development, mid run, between runs, etc...

A couple extra TODOs:

  • Gate register, run, etc.. with a flag during run in order to prevent rerunning if found in flow file
  • Evaluate labeling
  • Resolve GitHub storage one-to-many challenge

Why is this PR important?

Opens up the door for new file based storage, flow cloning, and "hot reloading" flows.

@joshmeek joshmeek requested review from jcrist and cicdw June 22, 2020 20:12
@codecov
Copy link

codecov bot commented Jun 23, 2020

Codecov Report

Merging #2840 into master will decrease coverage by 0.03%.
The diff coverage is 88.34%.

@joshmeek joshmeek changed the title [WIP] File-based flow storage File-based flow storage Jun 23, 2020
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions - exciting stuff!

src/prefect/cli/register.py Outdated Show resolved Hide resolved
src/prefect/environments/storage/github.py Show resolved Hide resolved
src/prefect/cli/register.py Outdated Show resolved Hide resolved
src/prefect/environments/storage/github.py Outdated Show resolved Hide resolved
tests/utilities/test_storage.py Outdated Show resolved Hide resolved
docs/orchestration/execution/storage_options.md Outdated Show resolved Hide resolved
src/prefect/cli/register.py Outdated Show resolved Hide resolved
Copy link
Member

@cicdw cicdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor doc tweak, otherwise LGTM!

docs/core/idioms/file-based.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement new file-based storage types
3 participants