feat: add vary dimensions for environment-isolated data storage (#384) by stack72 · Pull Request #435 · systeminit/swamp

stack72 · 2026-02-23T22:45:27Z

Summary

Multi-environment workflows (dev, staging, prod) previously stored all data
versions under a single data name with one shared latest symlink. This caused
interleaved data, environment-blind latest pointers, and unsafe CEL references
that could silently return wrong-environment data.

This change introduces a vary mechanism on dataOutputOverrides that
automatically computes composite data names (e.g., result-prod,
result-dev-us-east-1) from declared dimension keys, keeping each environment's
data isolated with its own versioning and latest symlink.

What changed

Write-side: `vary` field on `dataOutputOverrides`

Added vary?: string[] to DataOutputOverride interface and Zod schema
Created composeDataName() utility that joins base name + vary values with
hyphens (e.g., composeDataName("result", ["prod"]) → "result-prod")
Vary dimensions are resolved at step execution time from evaluated step inputs
in execution_service.ts, producing a resolvedVarySuffix on the method
context
Both createResourceWriter and createFileWriterFactory in data_writer.ts
apply the suffix to instance names before writing
Step serialization (fromData/toData) round-trips the vary field

Read-side: CEL `data.*()` with vary dimensions

data.latest(model, spec, [varyValues]) — 3-arg form
data.version(model, spec, [varyValues], version) — 4-arg form
data.listVersions(model, spec, [varyValues]) — 3-arg form
Composite name is resolved in the CEL layer before delegating to the data
cache — no changes needed to DataNamespace, DataCache, or model code
Fixed a latent bug where CEL int values (bigint) didn't match number keys
in the DataCache Map, causing data.version() to silently return null

Documentation

design/workflow.md — vary syntax, on-disk layout, access patterns
design/expressions.md — 3-arg data.latest/version/listVersions with vary
Skill references updated with forEach + vary examples

Impact on existing workflows

Zero breaking changes. Existing workflows that don't use vary are
completely unaffected:

The vary field is optional — omitting it preserves current behavior
All 2-argument data.latest(), data.version(), and data.listVersions()
calls continue to work unchanged
Models still call writeResource(specName, instanceName, data) with no
changes — the vary mechanism intercepts and modifies the instance name only
when vary is declared
The data.version() bigint fix improves reliability for all users, not just
vary users

What this enables

Workflows can now safely run the same model across multiple environments with
isolated data:

steps:
  - name: scan-${{ self.env }}
    forEach:
      item: env
      in: ${{ inputs.environments }}
    task:
      type: model_method
      modelIdOrName: scanner
      methodName: execute
      inputs:
        environment: ${{ self.env }}
    dataOutputOverrides:
      - specName: result
        vary:
          - environment

This produces isolated on-disk storage per environment:

.swamp/data/scanner/{id}/
  result-dev/      → own versioning, own latest symlink
  result-staging/  → own versioning, own latest symlink
  result-prod/     → own versioning, own latest symlink

Downstream steps access the correct environment's data dynamically:

scanResult: ${{ data.latest('scanner', 'result', [self.env]).attributes.count }}

Plan vs implementation

All planned items were implemented with three deviations:

Number() conversion in CEL — Not in plan. Discovered during integration
testing that CEL int values don't match number keys in JavaScript Maps.
Fixed in both 3-arg and 4-arg data.version() paths.
Documentation examples — Initial implementation used hardcoded values
like ['prod'] which defeats vary's purpose. Fixed to show dynamic access
via self.env and inputs.environment.
Test consolidation — Plan called for separate integration test files.
Combined into integration/vary_test.ts for cohesion.

Test coverage

8 unit tests for composeDataName() utility
7 unit tests for CEL evaluator vary overloads
4 unit tests for data writer with resolvedVarySuffix
7 integration tests including full CLI workflow run with forEach + vary
All 1862 tests pass

UAT tracking

UAT scenarios filed at systeminit/swamp-uat#25 covering multi-environment
end-to-end workflows, cross-workflow data access, re-run versioning, and error
cases.

Closes #384

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

## Summary Multi-environment workflows (dev, staging, prod) previously stored all data versions under a single data name with one shared `latest` symlink. This caused interleaved data, environment-blind `latest` pointers, and unsafe CEL references that could silently return wrong-environment data. This change introduces a `vary` mechanism on `dataOutputOverrides` that automatically computes composite data names (e.g., `result-prod`, `result-dev-us-east-1`) from declared dimension keys, keeping each environment's data isolated with its own versioning and `latest` symlink. ## What changed ### Write-side: `vary` field on `dataOutputOverrides` - Added `vary?: string[]` to `DataOutputOverride` interface and Zod schema - Created `composeDataName()` utility that joins base name + vary values with hyphens (e.g., `composeDataName("result", ["prod"])` → `"result-prod"`) - Vary dimensions are resolved at step execution time from evaluated step inputs in `execution_service.ts`, producing a `resolvedVarySuffix` on the method context - Both `createResourceWriter` and `createFileWriterFactory` in `data_writer.ts` apply the suffix to instance names before writing - Step serialization (`fromData`/`toData`) round-trips the `vary` field ### Read-side: CEL `data.*()` with vary dimensions - `data.latest(model, spec, [varyValues])` — 3-arg form - `data.version(model, spec, [varyValues], version)` — 4-arg form - `data.listVersions(model, spec, [varyValues])` — 3-arg form - Composite name is resolved in the CEL layer before delegating to the data cache — no changes needed to `DataNamespace`, `DataCache`, or model code - Fixed a latent bug where CEL `int` values (bigint) didn't match `number` keys in the DataCache Map, causing `data.version()` to silently return null ### Documentation - `design/workflow.md` — vary syntax, on-disk layout, access patterns - `design/expressions.md` — 3-arg `data.latest/version/listVersions` with vary - Skill references updated with forEach + vary examples ## Impact on existing workflows **Zero breaking changes.** Existing workflows that don't use `vary` are completely unaffected: - The `vary` field is optional — omitting it preserves current behavior - All 2-argument `data.latest()`, `data.version()`, and `data.listVersions()` calls continue to work unchanged - Models still call `writeResource(specName, instanceName, data)` with no changes — the vary mechanism intercepts and modifies the instance name only when `vary` is declared - The `data.version()` bigint fix improves reliability for all users, not just vary users ## What this enables Workflows can now safely run the same model across multiple environments with isolated data: ```yaml steps: - name: scan-${{ self.env }} forEach: item: env in: ${{ inputs.environments }} task: type: model_method modelIdOrName: scanner methodName: execute inputs: environment: ${{ self.env }} dataOutputOverrides: - specName: result vary: - environment ``` This produces isolated on-disk storage per environment: ``` .swamp/data/scanner/{id}/ result-dev/ → own versioning, own latest symlink result-staging/ → own versioning, own latest symlink result-prod/ → own versioning, own latest symlink ``` Downstream steps access the correct environment's data dynamically: ```yaml scanResult: ${{ data.latest('scanner', 'result', [self.env]).attributes.count }} ``` ## Plan vs implementation All planned items were implemented with three deviations: 1. **`Number()` conversion in CEL** — Not in plan. Discovered during integration testing that CEL `int` values don't match `number` keys in JavaScript Maps. Fixed in both 3-arg and 4-arg `data.version()` paths. 2. **Documentation examples** — Initial implementation used hardcoded values like `['prod']` which defeats vary's purpose. Fixed to show dynamic access via `self.env` and `inputs.environment`. 3. **Test consolidation** — Plan called for separate integration test files. Combined into `integration/vary_test.ts` for cohesion. ## Test coverage - 8 unit tests for `composeDataName()` utility - 7 unit tests for CEL evaluator vary overloads - 4 unit tests for data writer with `resolvedVarySuffix` - 7 integration tests including full CLI workflow run with forEach + vary - All 1862 tests pass ## UAT tracking UAT scenarios filed at systeminit/swamp-uat#25 covering multi-environment end-to-end workflows, cross-workflow data access, re-run versioning, and error cases. Closes #384 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions

Review: feat: add vary dimensions for environment-isolated data storage

Verdict: APPROVED ✓

This PR introduces a well-designed vary mechanism for isolating data per environment in forEach workflows. The implementation is clean, follows project guidelines, and maintains full backward compatibility.

Code Quality ✓

TypeScript strict mode: No any types; proper use of unknown for CEL dynamic values
Named exports: All exports are named (e.g., composeDataName from mod.ts)
Copyright headers: Present on all new files
Error handling: Proper UserError for missing vary dimensions with clear messages

Domain-Driven Design ✓

composeDataName: Pure utility function appropriately placed in src/domain/data/
DataOutputOverride: Value Object with immutable semantics (documented in source)
Vary resolution: Happens in execution_service.ts (Domain Service) - correct layer for orchestrating step execution
CEL changes: Infrastructure layer (src/infrastructure/cel/) - appropriate separation

Test Coverage ✓

8 unit tests for composeDataName() including edge cases (empty strings, whitespace)
7 unit tests for CEL evaluator vary overloads
4 unit tests for data writer with resolvedVarySuffix
7 integration tests including full CLI workflow run with forEach + vary
Tests co-located with source files per project conventions

Security ✓

Input validation for empty base names and vary values
Error messages don't expose sensitive information
No injection vulnerabilities identified

Backward Compatibility ✓

All 2-argument data.latest(), data.version(), data.listVersions() calls unchanged
vary field is optional - existing workflows unaffected
The bigint→number fix in data.version() improves reliability for all users

Suggestions (non-blocking)

Special character handling in vary values: composeDataName doesn't sanitize vary values for characters that could be problematic in file paths (e.g., /, \, ..). Current usage appears safe since values come from controlled inputs, but consider adding validation if vary values could come from untrusted sources in the future.
Explicit bigint test: The PR mentions fixing a latent bigint comparison bug. Consider adding a test that explicitly verifies CEL int (bigint) values work correctly with data.version() to prevent regression.

The documentation updates in design/*.md and skill references are thorough and provide clear examples. The PR description is excellent with clear before/after explanations.

LGTM!

…gression test Address review feedback from #435: 1. Path safety validation in composeDataName(): Vary values that contain path separators (/ or \) or relative path components (. or ..) are now rejected with clear error messages. This prevents vary values from creating data outside the expected directory structure, hardening against future use cases where vary values may come from untrusted sources. 2. Explicit bigint regression test: Added a test that simulates the DataCache's strict number-keyed Map.get() behavior to verify that CEL int values (which may be bigint) are correctly converted to number before cache lookup. This covers both the 3-arg and 4-arg (with vary) forms of data.version(), preventing regression of the fix from #435. Test coverage: - 4 new unit tests for path safety (forward slash, backslash, .., .) - 1 new regression test for bigint-to-number conversion - All 1867 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

stack72 force-pushed the feat/vary-dimensions branch from 0df2b73 to 600126a Compare February 23, 2026 22:46

github-actions bot approved these changes Feb 23, 2026

View reviewed changes

stack72 merged commit d5dacad into main Feb 23, 2026
3 checks passed

stack72 deleted the feat/vary-dimensions branch February 23, 2026 22:51

stack72 mentioned this pull request Feb 23, 2026

fix: harden vary dimensions with path safety and bigint regression test #436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add vary dimensions for environment-isolated data storage (#384)#435

feat: add vary dimensions for environment-isolated data storage (#384)#435
stack72 merged 1 commit intomainfrom
feat/vary-dimensions

stack72 commented Feb 23, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

stack72 commented Feb 23, 2026

Summary

What changed

Write-side: vary field on dataOutputOverrides

Read-side: CEL data.*() with vary dimensions

Documentation

Impact on existing workflows

What this enables

Plan vs implementation

Test coverage

UAT tracking

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Review: feat: add vary dimensions for environment-isolated data storage

Code Quality ✓

Domain-Driven Design ✓

Test Coverage ✓

Security ✓

Backward Compatibility ✓

Suggestions (non-blocking)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Write-side: `vary` field on `dataOutputOverrides`

Read-side: CEL `data.*()` with vary dimensions