feat: add vary dimensions for environment-isolated data storage (#384)#435
feat: add vary dimensions for environment-isolated data storage (#384)#435
Conversation
## Summary
Multi-environment workflows (dev, staging, prod) previously stored all data
versions under a single data name with one shared `latest` symlink. This caused
interleaved data, environment-blind `latest` pointers, and unsafe CEL references
that could silently return wrong-environment data.
This change introduces a `vary` mechanism on `dataOutputOverrides` that
automatically computes composite data names (e.g., `result-prod`,
`result-dev-us-east-1`) from declared dimension keys, keeping each environment's
data isolated with its own versioning and `latest` symlink.
## What changed
### Write-side: `vary` field on `dataOutputOverrides`
- Added `vary?: string[]` to `DataOutputOverride` interface and Zod schema
- Created `composeDataName()` utility that joins base name + vary values with
hyphens (e.g., `composeDataName("result", ["prod"])` → `"result-prod"`)
- Vary dimensions are resolved at step execution time from evaluated step inputs
in `execution_service.ts`, producing a `resolvedVarySuffix` on the method
context
- Both `createResourceWriter` and `createFileWriterFactory` in `data_writer.ts`
apply the suffix to instance names before writing
- Step serialization (`fromData`/`toData`) round-trips the `vary` field
### Read-side: CEL `data.*()` with vary dimensions
- `data.latest(model, spec, [varyValues])` — 3-arg form
- `data.version(model, spec, [varyValues], version)` — 4-arg form
- `data.listVersions(model, spec, [varyValues])` — 3-arg form
- Composite name is resolved in the CEL layer before delegating to the data
cache — no changes needed to `DataNamespace`, `DataCache`, or model code
- Fixed a latent bug where CEL `int` values (bigint) didn't match `number` keys
in the DataCache Map, causing `data.version()` to silently return null
### Documentation
- `design/workflow.md` — vary syntax, on-disk layout, access patterns
- `design/expressions.md` — 3-arg `data.latest/version/listVersions` with vary
- Skill references updated with forEach + vary examples
## Impact on existing workflows
**Zero breaking changes.** Existing workflows that don't use `vary` are
completely unaffected:
- The `vary` field is optional — omitting it preserves current behavior
- All 2-argument `data.latest()`, `data.version()`, and `data.listVersions()`
calls continue to work unchanged
- Models still call `writeResource(specName, instanceName, data)` with no
changes — the vary mechanism intercepts and modifies the instance name only
when `vary` is declared
- The `data.version()` bigint fix improves reliability for all users, not just
vary users
## What this enables
Workflows can now safely run the same model across multiple environments with
isolated data:
```yaml
steps:
- name: scan-${{ self.env }}
forEach:
item: env
in: ${{ inputs.environments }}
task:
type: model_method
modelIdOrName: scanner
methodName: execute
inputs:
environment: ${{ self.env }}
dataOutputOverrides:
- specName: result
vary:
- environment
```
This produces isolated on-disk storage per environment:
```
.swamp/data/scanner/{id}/
result-dev/ → own versioning, own latest symlink
result-staging/ → own versioning, own latest symlink
result-prod/ → own versioning, own latest symlink
```
Downstream steps access the correct environment's data dynamically:
```yaml
scanResult: ${{ data.latest('scanner', 'result', [self.env]).attributes.count }}
```
## Plan vs implementation
All planned items were implemented with three deviations:
1. **`Number()` conversion in CEL** — Not in plan. Discovered during integration
testing that CEL `int` values don't match `number` keys in JavaScript Maps.
Fixed in both 3-arg and 4-arg `data.version()` paths.
2. **Documentation examples** — Initial implementation used hardcoded values
like `['prod']` which defeats vary's purpose. Fixed to show dynamic access
via `self.env` and `inputs.environment`.
3. **Test consolidation** — Plan called for separate integration test files.
Combined into `integration/vary_test.ts` for cohesion.
## Test coverage
- 8 unit tests for `composeDataName()` utility
- 7 unit tests for CEL evaluator vary overloads
- 4 unit tests for data writer with `resolvedVarySuffix`
- 7 integration tests including full CLI workflow run with forEach + vary
- All 1862 tests pass
## UAT tracking
UAT scenarios filed at systeminit/swamp-uat#25 covering multi-environment
end-to-end workflows, cross-workflow data access, re-run versioning, and error
cases.
Closes #384
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0df2b73 to
600126a
Compare
There was a problem hiding this comment.
Review: feat: add vary dimensions for environment-isolated data storage
Verdict: APPROVED ✓
This PR introduces a well-designed vary mechanism for isolating data per environment in forEach workflows. The implementation is clean, follows project guidelines, and maintains full backward compatibility.
Code Quality ✓
- TypeScript strict mode: No
anytypes; proper use ofunknownfor CEL dynamic values - Named exports: All exports are named (e.g.,
composeDataNamefrommod.ts) - Copyright headers: Present on all new files
- Error handling: Proper
UserErrorfor missing vary dimensions with clear messages
Domain-Driven Design ✓
composeDataName: Pure utility function appropriately placed insrc/domain/data/DataOutputOverride: Value Object with immutable semantics (documented in source)- Vary resolution: Happens in
execution_service.ts(Domain Service) - correct layer for orchestrating step execution - CEL changes: Infrastructure layer (
src/infrastructure/cel/) - appropriate separation
Test Coverage ✓
- 8 unit tests for
composeDataName()including edge cases (empty strings, whitespace) - 7 unit tests for CEL evaluator vary overloads
- 4 unit tests for data writer with
resolvedVarySuffix - 7 integration tests including full CLI workflow run with forEach + vary
- Tests co-located with source files per project conventions
Security ✓
- Input validation for empty base names and vary values
- Error messages don't expose sensitive information
- No injection vulnerabilities identified
Backward Compatibility ✓
- All 2-argument
data.latest(),data.version(),data.listVersions()calls unchanged varyfield is optional - existing workflows unaffected- The bigint→number fix in
data.version()improves reliability for all users
Suggestions (non-blocking)
-
Special character handling in vary values:
composeDataNamedoesn't sanitize vary values for characters that could be problematic in file paths (e.g.,/,\,..). Current usage appears safe since values come from controlled inputs, but consider adding validation if vary values could come from untrusted sources in the future. -
Explicit bigint test: The PR mentions fixing a latent bigint comparison bug. Consider adding a test that explicitly verifies CEL
int(bigint) values work correctly withdata.version()to prevent regression.
The documentation updates in design/*.md and skill references are thorough and provide clear examples. The PR description is excellent with clear before/after explanations.
LGTM!
…gression test Address review feedback from #435: 1. Path safety validation in composeDataName(): Vary values that contain path separators (/ or \) or relative path components (. or ..) are now rejected with clear error messages. This prevents vary values from creating data outside the expected directory structure, hardening against future use cases where vary values may come from untrusted sources. 2. Explicit bigint regression test: Added a test that simulates the DataCache's strict number-keyed Map.get() behavior to verify that CEL int values (which may be bigint) are correctly converted to number before cache lookup. This covers both the 3-arg and 4-arg (with vary) forms of data.version(), preventing regression of the fix from #435. Test coverage: - 4 new unit tests for path safety (forward slash, backslash, .., .) - 1 new regression test for bigint-to-number conversion - All 1867 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Multi-environment workflows (dev, staging, prod) previously stored all data
versions under a single data name with one shared
latestsymlink. This causedinterleaved data, environment-blind
latestpointers, and unsafe CEL referencesthat could silently return wrong-environment data.
This change introduces a
varymechanism ondataOutputOverridesthatautomatically computes composite data names (e.g.,
result-prod,result-dev-us-east-1) from declared dimension keys, keeping each environment'sdata isolated with its own versioning and
latestsymlink.What changed
Write-side:
varyfield ondataOutputOverridesvary?: string[]toDataOutputOverrideinterface and Zod schemacomposeDataName()utility that joins base name + vary values withhyphens (e.g.,
composeDataName("result", ["prod"])→"result-prod")in
execution_service.ts, producing aresolvedVarySuffixon the methodcontext
createResourceWriterandcreateFileWriterFactoryindata_writer.tsapply the suffix to instance names before writing
fromData/toData) round-trips thevaryfieldRead-side: CEL
data.*()with vary dimensionsdata.latest(model, spec, [varyValues])— 3-arg formdata.version(model, spec, [varyValues], version)— 4-arg formdata.listVersions(model, spec, [varyValues])— 3-arg formcache — no changes needed to
DataNamespace,DataCache, or model codeintvalues (bigint) didn't matchnumberkeysin the DataCache Map, causing
data.version()to silently return nullDocumentation
design/workflow.md— vary syntax, on-disk layout, access patternsdesign/expressions.md— 3-argdata.latest/version/listVersionswith varyImpact on existing workflows
Zero breaking changes. Existing workflows that don't use
varyarecompletely unaffected:
varyfield is optional — omitting it preserves current behaviordata.latest(),data.version(), anddata.listVersions()calls continue to work unchanged
writeResource(specName, instanceName, data)with nochanges — the vary mechanism intercepts and modifies the instance name only
when
varyis declareddata.version()bigint fix improves reliability for all users, not justvary users
What this enables
Workflows can now safely run the same model across multiple environments with
isolated data:
This produces isolated on-disk storage per environment:
Downstream steps access the correct environment's data dynamically:
Plan vs implementation
All planned items were implemented with three deviations:
Number()conversion in CEL — Not in plan. Discovered during integrationtesting that CEL
intvalues don't matchnumberkeys in JavaScript Maps.Fixed in both 3-arg and 4-arg
data.version()paths.like
['prod']which defeats vary's purpose. Fixed to show dynamic accessvia
self.envandinputs.environment.Combined into
integration/vary_test.tsfor cohesion.Test coverage
composeDataName()utilityresolvedVarySuffixUAT tracking
UAT scenarios filed at systeminit/swamp-uat#25 covering multi-environment
end-to-end workflows, cross-workflow data access, re-run versioning, and error
cases.
Closes #384
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com