-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Problem
In multi-environment workflows, the same model runs for dev, staging, and prod, but all data versions are stored under a single data name with one shared latest symlink:
.swamp/data/command/shell/{model-id}/result/
├── 1/ (dev run)
├── 2/ (prod run)
├── 3/ (dev run)
└── latest -> 3 # points to dev, not prod
This causes three problems:
- Interleaved data — versions from different environments are mixed together with no structural separation
latestis environment-blind — it always points to whichever environment ran most recently, regardless of which environment you care about- Unsafe CEL references —
model["deploy-app"].resource.result.result.attributes.exitCodesilently returns data from the wrong environment if another environment ran more recently
Proposed Solution: Vary Dimensions
Rather than overriding the data name with a CEL expression (manual name construction), introduce a vary mechanism. You declare which dimensions the data should vary by, and the system computes the storage path automatically.
The composite key for data becomes: specName + dataName + vary1 + vary2 + ...
This is an extension, not an override — the vary dimensions are appended to the base specName+dataName to create isolated storage paths.
How it works
The workflow step declares which dimensions the data should vary by:
steps:
- name: deploy-app
task:
type: model_method
modelIdOrName: deploy-app
methodName: execute
inputs:
environment: ${{ inputs.environment }}
dataOutputOverrides:
- specName: result
vary:
- environmentWhen environment=dev, the data is stored under a composite name that includes the vary dimension. When environment=prod, it gets a separate path. Each gets its own latest symlink.
Result on disk
Each environment gets its own data path with its own latest symlink:
.swamp/data/command/shell/{model-id}/
├── result-dev/
│ ├── 1/
│ ├── 2/
│ └── latest -> 2 # latest dev, always correct
├── result-staging/
│ ├── 1/
│ └── latest -> 1 # latest staging, always correct
└── result-prod/
├── 1/
├── 2/
├── 3/
└── latest -> 3 # latest prod, always correct
Multiple vary dimensions
Vary is composable. Multiple dimensions create further isolation:
dataOutputOverrides:
- specName: result
vary:
- environment
- regionWith environment=dev and region=us-east-1, the composite key becomes specName + dataName + dev + us-east-1, stored at something like result-dev-us-east-1/.
CEL access
Existing CEL patterns work naturally with the computed data names:
# Access latest prod result
${{ model["deploy-app"].resource.result.result-prod.attributes.exitCode }}
# Access latest dev us-east-1 result
${{ model["deploy-app"].resource.result.result-dev-us-east-1.attributes.exitCode }}
# data.latest() also works
${{ data.latest("deploy-app", "result-prod").attributes.exitCode }}
Why Vary Instead of Name Override
The original proposal was to add a name field to dataOutputOverrides with CEL expressions to construct data names manually. The vary approach is better because:
- No manual name construction — you don't write CEL to build names, you just declare dimensions
- The system knows the dimensions — can list, query, and reason about them
- Composable —
vary: [environment, region]naturally extends to multi-dimensional isolation - It's additive (extension) not destructive (override) — the base specName+dataName stays intact
What This Replaces
This approach subsumes several related issues by solving the core data isolation problem through naming:
- Tag-filtered data access in CEL (Support tag-filtered data access in CEL expressions #386) — unnecessary, data lives at distinct paths
latestsymlink is environment-blind (latest symlink is environment-blind in multi-environment workflows #389) — solved, each vary combination has its ownlatest- Workflow-level tags with CEL (Support workflow-level tags with CEL expressions #385) — primary motivation was environment isolation, now handled by vary
- Tags in input files (Support tags in workflow input files #388) — primary motivation was preventing forgotten
--tagfor isolation
Current Codebase Context
The foundation for this already exists:
- specName vs dataName are already distinct concepts in the codebase.
specNameis the key in the model'sresourcesmap;dataNameis the on-disk directory name (second arg towriteResource(specName, dataName, data)). latestsymlink is already per-dataName, so varying the dataName automatically gives each variant its ownlatest.- CEL context already indexes as
model[name].resource[specName][dataName], so varied data names populate naturally. - DataOutputOverride currently has:
specName,lifetime,garbageCollection,tags.
Implementation
- Add a
varyfield toDataOutputOverride— an array of input/context key names - At runtime, resolve the vary values from the step's inputs/context
- Compute the composite dataName by appending the resolved vary values to the base dataName
- Pass the composite dataName through to
writeResource()/createFileWriter() - The CEL context,
latestsymlinks, and data queries all work automatically since they're already keyed by dataName
Key files
src/domain/models/data_output_override.ts— addvaryfield to type + schemasrc/domain/models/data_writer.ts— compute composite dataName from vary valuessrc/domain/models/method_execution_service.ts— pass vary context throughsrc/domain/workflows/execution_service.ts— resolve vary values from step inputs
Use Case
A team has a single deploy-pipeline workflow reused across dev, staging, and prod:
swamp workflow run deploy-pipeline --input-file inputs/dev.yaml
swamp workflow run deploy-pipeline --input-file inputs/prod.yamlEach run produces data under a vary-extended name. The latest symlink for each environment is always correct. Downstream models can reference a specific environment's data via CEL without risk of reading the wrong environment's data.