Skip to content

Comments

fix: normalize zero-duration lifetime to workflow instead of immediate expiry#454

Merged
stack72 merged 1 commit intomainfrom
fix-zero-duration-lifetime
Feb 24, 2026
Merged

fix: normalize zero-duration lifetime to workflow instead of immediate expiry#454
stack72 merged 1 commit intomainfrom
fix-zero-duration-lifetime

Conversation

@stack72
Copy link
Contributor

@stack72 stack72 commented Feb 24, 2026

Summary

  • Zero-duration strings like "0h", "0d", "00w" passed LifetimeSchema validation but produced 0ms when parsed by parseDuration(), causing data to expire immediately on creation. GC would then delete it on the next scan.
  • Instead of rejecting zero-duration values (which could break existing on-disk data loaded via fromData()), we normalize them to "workflow" lifetime at the Data entity boundary. This means the data lives for the duration of the workflow run — a sensible default that prevents silent data loss.
  • GarbageCollectionSchema now rejects zero-duration strings via .refine() (a GC policy of "0d" is nonsensical). The refine approach preserves acceptance of leading-zero non-zero values like "01d", staying consistent with LifetimeSchema.

Rationale: why normalize to "workflow" instead of rejecting?

Rejecting zero-duration values in both create() and fromData() would be simpler, but it would break any repo that already has data on disk with lifetime: "0h"fromData() would throw a ZodError when loading it. Normalizing handles this gracefully: existing data loads correctly, and the "workflow" semantic is the safest fallback since it ties data lifecycle to something concrete (the workflow run) rather than deleting it immediately.

Lifetime + GC interaction (answering @adamhjk's question)

if I have a workflow lifetime and a gc of 1, what do we expect to happen?

Lifetime and garbage collection are orthogonal mechanisms:

  • Lifetime controls when data expires. "workflow" lifetime means the data expires when the associated workflow run is deleted (checked via isExpired()workflowRunRepo.findById()). It does not use time-based expiration — calculateExpiration("workflow", ...) returns null.
  • GC controls how many versions are retained on disk. gc: 1 means only the 1 most recent version is kept; older versions are hard-deleted.

With lifetime: "workflow" and gc: 1:

  1. While the workflow runs: Data is accessible. Multiple writes create new versions, but GC prunes to only the latest 1 version.
  2. After the workflow run is deleted: isExpired() returns true → the "latest" symlink is removed (soft delete, data becomes inaccessible). Then GC hard-deletes old versions, leaving at most 1 version on disk.

No conflict — they compose naturally.

Changes

File Change
src/domain/data/data_metadata.ts Add normalizeLifetime() helper; add .refine() to GarbageCollectionSchema rejecting zero-duration strings
src/domain/data/data.ts Call normalizeLifetime() in Data.create() and Data.fromData() before Zod validation
src/domain/data/data_metadata_test.ts New file — 42 tests for normalizeLifetime() and GarbageCollectionSchema
src/domain/data/data_test.ts 11 new tests for zero-duration normalization through Data.create(), Data.fromData(), and roundtrip
src/domain/data/data_lifecycle_service_test.ts 6 new tests verifying zero-duration data gets null expiration (same as "workflow")

Test plan

  • All 95 tests in the 3 modified/new test files pass
  • Full test suite passes (1997 tests, 0 failures)
  • deno check passes
  • deno lint passes
  • deno fmt passes

Closes #338

🤖 Generated with Claude Code

…ate expiry

Zero-duration strings like "0h", "0d", "00w" passed LifetimeSchema validation
but produced 0ms when parsed by parseDuration(), causing data to expire
immediately on creation. GC would then delete it on the next scan.

Instead of rejecting zero-duration values (which could break existing on-disk
data), we normalize them to "workflow" lifetime at the Data entity boundary.
This means the data lives for the duration of the workflow run — a sensible
default that prevents silent data loss.

The fix has three parts:

1. normalizeLifetime() helper in data_metadata.ts — converts zero-duration
   strings (including multi-digit zeros like "00d", "000h") to "workflow"
   while passing through all other values unchanged.

2. Data.create() and Data.fromData() both call normalizeLifetime() before
   Zod validation. This handles both new data creation and loading legacy
   on-disk data that may have been saved with "0h".

3. GarbageCollectionSchema uses .refine() to reject zero-duration strings.
   Unlike lifetime (where normalization preserves backward compat), a GC
   policy of "0d" is nonsensical ("keep versions from the last 0 days")
   and was already implicitly broken. The refine approach (vs regex change)
   preserves acceptance of leading-zero non-zero values like "01d".

Regarding the interaction between workflow lifetime and GC: these are
orthogonal mechanisms. Lifetime controls *when* data expires (workflow
lifetime = when the workflow run is deleted). GC controls *how many
versions* are retained on disk. With lifetime="workflow" and gc=1, data
lives for the workflow run duration and keeps only the 1 most recent
version — no conflict.

Closes #338

Co-authored-by: Magistr <umag@users.noreply.github.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR correctly fixes issue #338 where zero-duration lifetime strings like "0h", "0d", "00w" would cause data to expire immediately on creation. The approach of normalizing zero-duration values to "workflow" lifetime at the entity boundary is well-reasoned and handles backward compatibility gracefully.

Blocking Issues

None - This PR is ready to merge.

Code Quality Assessment

TypeScript & Code Style ✅

  • No any types used
  • All exports are named exports
  • AGPLv3 header present in new test file
  • Code passes lint, type check, and formatting (CI green)

Domain-Driven Design ✅

  • Correct boundary placement: Normalization happens at the Data entity boundary (create() and fromData()), which is the right place for domain validation/transformation
  • Pure function: normalizeLifetime() is a stateless helper that can be reasoned about in isolation
  • Domain semantic choice: Converting zero-duration to "workflow" is a sensible domain decision that ties data lifecycle to something concrete rather than causing silent data loss

Test Coverage ✅

Comprehensive testing across 3 files (~59 new tests):

  • data_metadata_test.ts (new): 42 tests for normalizeLifetime() and GarbageCollectionSchema
  • data_test.ts: 11 tests for normalization through Data.create(), Data.fromData(), and roundtrip
  • data_lifecycle_service_test.ts: 6 tests verifying zero-duration data gets null expiration

Test coverage includes edge cases:

  • Zero with single digit (0h, 0d)
  • Leading zeros that equal zero (00d, 000h)
  • Leading zeros on non-zero values (01d, 007h) - correctly pass through
  • Special lifetime values (ephemeral, workflow, etc.) - unchanged

Security ✅

  • No injection vulnerabilities
  • Regex is bounded (no ReDoS risk)
  • No direct user input execution

Implementation Details ✅

  • Regex in normalizeLifetime() matches LifetimeSchema pattern
  • GarbageCollectionSchema correctly rejects zero-duration via .refine() (a GC policy of "0d" is nonsensical)
  • Different treatment for LifetimeSchema (normalize) vs GarbageCollectionSchema (reject) is appropriate given the different semantics

Suggestions (Non-blocking)

None - the implementation is clean and focused. The PR description is also excellent with clear rationale for design decisions.


LGTM! 🎉

@stack72 stack72 merged commit c862814 into main Feb 24, 2026
4 checks passed
@stack72 stack72 deleted the fix-zero-duration-lifetime branch February 24, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duration parsing accepts zero values, data expires immediately

1 participant