Skip to content

docs: warn that environment variables accessed via CEL env context may be persisted in model output data #414

@stack72

Description

@stack72

Summary

buildEnvContext() in model_resolver.ts passes the entire process environment to the CEL evaluation context as the env namespace:

// src/domain/expressions/model_resolver.ts:33-35
export function buildEnvContext(): Record<string, string> {
  return { ...Deno.env.toObject() };
}

This includes any sensitive values present at runtime: AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN, database passwords, etc.

The Risk

This is intentional behaviour — accessing environment variables via env.VAR_NAME in CEL expressions is a documented and tested feature (see shell-env-cel.yaml UAT fixture). However, users may not realise that if they use a sensitive env var as a model attribute value, that value will be:

  1. Stored in .swamp/data/ as part of model output (JSON resource files)
  2. Shown in swamp data get output
  3. Indexed in the symlink-based repo index under models/

Example

A user writes:

attributes:
  token: "env.GITHUB_TOKEN"

After CEL evaluation, token will be the literal value of GITHUB_TOKEN. This value is then persisted in the model's output data on disk and readable by anyone with access to the .swamp/ directory.

Root Cause

design/expressions.md documents model data, inputs, data versioning, and vault (sensitive data), but has no documentation about the env namespace at all. There is no guidance distinguishing env.* (exposed, persisted in output) from vault.get() (values fetched at runtime, never stored in output data).

Plan

Documentation-only fix (no runtime changes).

1. design/expressions.md — Add ## Environment Variables section

Append after the existing ## Sensitive Data section (after line 166):

  • What env is: all process environment variables available as env.VAR_NAME
  • Basic usage example
  • Security warning: values from env are NOT redacted; if used as model attributes they are stored in .swamp/data/ on disk and visible in swamp data get output
  • Guidance: use vault.get() for sensitive values (API keys, passwords, tokens) — vault values are fetched at runtime and never persisted in model output data
  • Comparison example: wrong approach (env.API_KEY) vs right approach (vault.get('my-vault', 'API_KEY'))

2. src/domain/expressions/model_resolver.ts — Expand JSDoc

buildEnvContext() docblock (lines 33–35): Expand from a 1-line comment to warn that:

  • Returns the entire process environment, unfiltered
  • Sensitive env vars accessible via env.VAR_NAME will be persisted in model output data if used as model attributes
  • Prefer vault.get() for sensitive values

ExpressionContext.env field comment (line 191–192): Expand from /** Environment variables */ to note full process environment exposure and point to vault as the alternative.

Out of Scope

  • Runtime heuristic warnings in swamp data get output (deferred — pattern matching on key names is brittle)
  • Fixing the model_output_get_output.ts identical JSON/log mode bug (separate issue)

Verification

  1. deno check — no type changes, passes unchanged
  2. deno lint — no lint-relevant changes
  3. deno fmt — reformat any touched .ts files
  4. deno run test — no behaviour changes, all tests pass

Metadata

Metadata

Assignees

Labels

betaIssues required to close out before public betadocumentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions