Github Actions-style data pipline execution engine.
- Flow: A single executable instance of a pipeline.
- Operator: A reusable processing task.
- Pipeline: The sequence of operators (tasks) to be performed.
- Step: A specific instance of an operator, configured and executed as part of a flow.
- Tenant: A permission boundary for access to resources.
Below is an example of a pipeline definition in YAML format:
name: user_data_pipeline
tenant: acme_corp
schema:
- name: name
description: Full name of the user
type: varchar
expectations:
not_null: true
min_length: 2
- name: age
description: Age of the user
type: integer
expectations:
min: 0
steps:
- name: load_data
uses: internal/read@1.0.0
config:
path: "gs://data.csv"
- name: filter_data
uses: internal/filter@latest
config:
conditions:
- [["length", ">=", 4], ["status", "==", "approved"]]
- [["is_published", "==", true]]
- name: save_results
uses: internal/save@1.0.0
config:
endpoint: "https://{{ environment.HOST }}/upload"
username: "{{ secrets.API_USER }}"
password: "{{ secrets.API_PASSWORD }}"