Skip to content

Discussion: Unify data access patterns for vary dimensions #434

@stack72

Description

@stack72

Context

While designing the vary dimensions feature (#384), we identified a tension between the two CEL data access patterns that makes vary harder to implement cleanly.

The Two Access Patterns Today

Pattern Sees current-run data? Sees prior-run data? Syntax
model["name"].resource.spec.data Yes — updated in-memory after each step Yes Dictionary traversal
data.latest("name", "data") No — snapshot from workflow start Yes Function call

Both exist intentionally:

  • model.* is mutable/live — enables step-to-step chaining within a workflow run
  • data.latest() is an immutable snapshot — provides consistency and isolation for cross-run access

The Problem with Vary

The vary feature (#384) needs a way to dynamically resolve the composite dataName from vary dimensions. A function call can do this naturally:

${{ data.latest("deploy-app", "result", [inputs.environment]).attributes.exitCode }}

But model.* is dictionary traversal — there's no function to extend. Within a workflow, you'd be stuck with manual string concatenation:

${{ model["deploy-app"].resource["result"]["result-" + inputs.environment].attributes.exitCode }}

This is exactly the manual name construction that vary is designed to eliminate.

The fundamental tension: the access pattern that benefits from vary (data.latest()) doesn't see current-run data, and the one that sees current-run data (model.*) can't benefit from vary.

Proposal: Make data.latest() see current-run data

If data.latest() were updated after each workflow step (same as the model context is today), it could serve as the single accessor that supports both vary and current-run data:

# Within a workflow — sees current-run data, with vary support
${{ data.latest("deploy-app", "result", [inputs.environment]).attributes.exitCode }}

# Cross-workflow — sees historical data, with vary support
${{ data.latest("deploy-app", "result", [inputs.environment]).attributes.exitCode }}

# Same syntax works in both cases

The model.* dot-notation would still exist for simple cases where vary isn't needed:

# Simple access without vary — still works
${{ model["deploy-app"].resource.result.result.attributes.exitCode }}

What this means for implementation

  • After each workflow step completes, update the data cache (same as the model context update that already happens in execution_service.ts)
  • data.latest() shifts from a pure snapshot to a live view within workflow execution
  • Outside of workflows (e.g., swamp model evaluate), behavior stays the same — reads from disk

Trade-offs to discuss

For:

  • Single accessor that supports both vary and current-run data
  • No manual string concatenation for vary dimensions
  • data.latest() becomes more useful within workflows
  • Consistent behavior — data.latest() always returns the most recent data, whether from this run or a prior run

Against:

  • Changes the snapshot isolation guarantee of data.latest() within workflows
  • Could cause surprises if users expect data.latest() to be stable throughout a workflow run
  • Adds complexity to the data cache (needs to support mid-run updates)
  • Two concurrent workflow runs could see each other's intermediate state if they share the same data cache (though this is already true for model.*)

Questions for Discussion

  1. Is the snapshot isolation of data.latest() something users depend on, or is it an implementation detail?
  2. Should the live behavior be opt-in (e.g., a separate function like data.current()) rather than changing data.latest()?
  3. Are there other approaches we haven't considered?

Related: #384, #385, #386, #388, #389

Metadata

Metadata

Assignees

Labels

betaIssues required to close out before public betaenhancementNew feature or requestready for workPlan has been posted and agreed on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions