-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Problem
There is no way to declare tags at the workflow level that automatically apply to all data artifacts produced during a workflow run. Today, if you want to tag data produced by a workflow, you must either:
- Use
--tagon the CLI at runtime (error-prone, requires discipline) - Add
dataOutputOverridesto every step for every spec name (result,log, etc.)
For a workflow with 5 steps that each produce 2 data specs (resource + log), that's 10 dataOutputOverrides entries — all with the same tags.
Proposal
A tags field at the workflow level that supports CEL expressions and applies to all data produced during the run:
name: deploy-pipeline
inputs:
properties:
environment:
type: string
enum: [dev, staging, prod]
image_tag:
type: string
tags:
environment: ${{ inputs.environment }}
image_tag: ${{ inputs.image_tag }}
jobs:
- name: deploy
steps:
- name: deploy-app
task:
type: model_method
modelIdOrName: deploy-app
methodName: execute
inputs:
environment: ${{ inputs.environment }}Every data artifact produced by every step in this workflow would automatically carry environment: prod and image_tag: v1.2.3 tags (or whatever values were passed as inputs), without any per-step configuration.
Why This Matters
Multi-environment workflows are a core use case for swamp. When the same workflow runs against dev, staging, and prod, the data artifacts all land under the same model paths with incrementing version numbers. Tags are the mechanism for distinguishing which environment produced which data.
Making this automatic at the workflow level means:
- Zero boilerplate — no
dataOutputOverrideson every step for every spec - Impossible to forget — the workflow definition guarantees tagging, not the user's memory
- Self-documenting — anyone reading the workflow YAML can see exactly what metadata will be attached to the data
- Consistent — every data artifact in the run gets the same tags, including resources and logs
Use Case
A platform team maintains a library of reusable workflows (deploy, rollback, database migration, etc.). Each workflow is parameterized by environment. They need all data produced by any workflow run to be tagged with the environment, the triggering user, and the release version — without modifying every step definition in every workflow.