Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline clusters #662

Merged
merged 6 commits into from
Jun 18, 2024
Merged

Pipeline clusters #662

merged 6 commits into from
Jun 18, 2024

Conversation

mwylde
Copy link
Member

@mwylde mwylde commented Jun 18, 2024

This PR adds a new way to run Arroyo, currently called "pipeline clusters" in lieu of a better name.

In this mode, the user pre-configures Arroyo with a query, which the system will run on startup. Other queries can be schedule via the API or UI, but the initial one is managed by the process—so it will be automatically stopped (with a checkpoint) when the process is stopped.

As a user, it looks like this:

$ arroyo run --help
Run a query as a local pipeline cluster

Usage: arroyo run [OPTIONS] [QUERY]

Arguments:
  [QUERY]  The query to run [default: -]

Options:
  -n, --name <NAME>                Name for this pipeline
      --database <DATABASE>        Path to a database file to save to or restore from
  -p, --parallelism <PARALLELISM>  Number of parallel subtasks to run [default: 1]
  -h, --help                       Print help

$ arroyo run query.sql
2024-06-18T17:20:40.113741Z  INFO arroyo::run: Job transitioned to Scheduling
2024-06-18T17:20:40.321154Z  INFO arroyo::run: Job transitioned to Running
2024-06-18T17:20:40.423451Z  INFO arroyo::run: Pipeline running... dashboard at http://localhost:53093/pipelines/pl_PmdK8u3ydK
{"AVG(price)":64651.984675480766}
{"AVG(price)":64641.925167410714}
{"AVG(price)":64640.880196049526}
^C2024-06-18T17:20:56.872535Z  INFO arroyo::run: Stopping pipeline with a final checkpoint...
2024-06-18T17:20:57.295913Z  INFO arroyo::run: Job transitioned to Stopped

Users can also provide the query as an environment variables (ARROYO__QUERY) which may be helpful when running as a docker container, for example in ECS.

This PR also adds a new sink—StdOut—which simply prints outputs to stdout. It's the default sink for pipeline clusters, and provides an interactive experience in the console.

@mwylde mwylde merged commit 758c95a into master Jun 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant