-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Problem
CEL data access functions (data.latest(), model.X.resource.Y.Z) always return the most recent version of data, with no way to filter by tags. In a multi-environment setup where the same model runs for dev, staging, and prod, the data versions are interleaved:
.swamp/data/command/shell/{model-id}/result/
├── 1/ (dev run, tags: environment=dev)
├── 2/ (prod run, tags: environment=prod)
├── 3/ (staging run, tags: environment=staging)
├── 4/ (dev run, tags: environment=dev)
└── latest -> 4
Any CEL expression referencing this model's data gets version 4 (the most recent dev run), regardless of what environment the consuming workflow or model is targeting:
${{ model["deploy-app"].resource.result.result.attributes.exitCode }}
There is no way to say "give me the latest result where environment=prod".
Why This Matters
Cross-model data references are a fundamental feature of swamp — models can read data produced by other models via CEL expressions. But in multi-environment setups, this is unsafe because there's no guarantee that the "latest" data belongs to the correct environment.
Consider this scenario:
infra-scannermodel runs for prod, writes infrastructure stateinfra-scannermodel runs for dev, writes infrastructure statedeploy-appmodel references${{ model["infra-scanner"].resource.result.latest.attributes.vpcId }}deploy-appgets the dev VPC ID even though it's deploying to prod
This is a silent data correctness issue — no error is raised, the wrong data is simply used.
Proposed Solution
Extend CEL data access functions to accept an optional tag filter:
${{ data.latest("deploy-app", "result", {"environment": "prod"}).attributes.exitCode }}
Or a dedicated function:
${{ data.latestByTag("deploy-app", "result", "environment", "prod").attributes.exitCode }}
This would allow CEL expressions to safely reference data from a specific environment, even when versions from multiple environments are interleaved under the same model.
Use Case
A multi-environment deployment pipeline where:
- An
infra-scannermodel runs periodically for each environment, writing infrastructure state - A
deployermodel references the scanner's output to get environment-specific configuration (VPC IDs, subnet lists, security groups) - The deployer must read the correct environment's infrastructure state, not whichever environment happened to run most recently
Without tag-filtered data access, users must carefully coordinate run ordering or maintain separate model instances per environment, defeating the purpose of reusable parameterized models.