fix: normalize zero-duration lifetime to workflow instead of immediate expiry#454
Merged
fix: normalize zero-duration lifetime to workflow instead of immediate expiry#454
Conversation
…ate expiry
Zero-duration strings like "0h", "0d", "00w" passed LifetimeSchema validation
but produced 0ms when parsed by parseDuration(), causing data to expire
immediately on creation. GC would then delete it on the next scan.
Instead of rejecting zero-duration values (which could break existing on-disk
data), we normalize them to "workflow" lifetime at the Data entity boundary.
This means the data lives for the duration of the workflow run — a sensible
default that prevents silent data loss.
The fix has three parts:
1. normalizeLifetime() helper in data_metadata.ts — converts zero-duration
strings (including multi-digit zeros like "00d", "000h") to "workflow"
while passing through all other values unchanged.
2. Data.create() and Data.fromData() both call normalizeLifetime() before
Zod validation. This handles both new data creation and loading legacy
on-disk data that may have been saved with "0h".
3. GarbageCollectionSchema uses .refine() to reject zero-duration strings.
Unlike lifetime (where normalization preserves backward compat), a GC
policy of "0d" is nonsensical ("keep versions from the last 0 days")
and was already implicitly broken. The refine approach (vs regex change)
preserves acceptance of leading-zero non-zero values like "01d".
Regarding the interaction between workflow lifetime and GC: these are
orthogonal mechanisms. Lifetime controls *when* data expires (workflow
lifetime = when the workflow run is deleted). GC controls *how many
versions* are retained on disk. With lifetime="workflow" and gc=1, data
lives for the workflow run duration and keeps only the 1 most recent
version — no conflict.
Closes #338
Co-authored-by: Magistr <umag@users.noreply.github.com>
There was a problem hiding this comment.
Review Summary
This PR correctly fixes issue #338 where zero-duration lifetime strings like "0h", "0d", "00w" would cause data to expire immediately on creation. The approach of normalizing zero-duration values to "workflow" lifetime at the entity boundary is well-reasoned and handles backward compatibility gracefully.
Blocking Issues
None - This PR is ready to merge.
Code Quality Assessment
TypeScript & Code Style ✅
- No
anytypes used - All exports are named exports
- AGPLv3 header present in new test file
- Code passes lint, type check, and formatting (CI green)
Domain-Driven Design ✅
- Correct boundary placement: Normalization happens at the
Dataentity boundary (create()andfromData()), which is the right place for domain validation/transformation - Pure function:
normalizeLifetime()is a stateless helper that can be reasoned about in isolation - Domain semantic choice: Converting zero-duration to
"workflow"is a sensible domain decision that ties data lifecycle to something concrete rather than causing silent data loss
Test Coverage ✅
Comprehensive testing across 3 files (~59 new tests):
data_metadata_test.ts(new): 42 tests fornormalizeLifetime()andGarbageCollectionSchemadata_test.ts: 11 tests for normalization throughData.create(),Data.fromData(), and roundtripdata_lifecycle_service_test.ts: 6 tests verifying zero-duration data getsnullexpiration
Test coverage includes edge cases:
- Zero with single digit (
0h,0d) - Leading zeros that equal zero (
00d,000h) - Leading zeros on non-zero values (
01d,007h) - correctly pass through - Special lifetime values (
ephemeral,workflow, etc.) - unchanged
Security ✅
- No injection vulnerabilities
- Regex is bounded (no ReDoS risk)
- No direct user input execution
Implementation Details ✅
- Regex in
normalizeLifetime()matchesLifetimeSchemapattern GarbageCollectionSchemacorrectly rejects zero-duration via.refine()(a GC policy of"0d"is nonsensical)- Different treatment for
LifetimeSchema(normalize) vsGarbageCollectionSchema(reject) is appropriate given the different semantics
Suggestions (Non-blocking)
None - the implementation is clean and focused. The PR description is also excellent with clear rationale for design decisions.
LGTM! 🎉
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
"0h","0d","00w"passedLifetimeSchemavalidation but produced 0ms when parsed byparseDuration(), causing data to expire immediately on creation. GC would then delete it on the next scan.fromData()), we normalize them to"workflow"lifetime at the Data entity boundary. This means the data lives for the duration of the workflow run — a sensible default that prevents silent data loss.GarbageCollectionSchemanow rejects zero-duration strings via.refine()(a GC policy of"0d"is nonsensical). The refine approach preserves acceptance of leading-zero non-zero values like"01d", staying consistent withLifetimeSchema.Rationale: why normalize to "workflow" instead of rejecting?
Rejecting zero-duration values in both
create()andfromData()would be simpler, but it would break any repo that already has data on disk withlifetime: "0h"—fromData()would throw aZodErrorwhen loading it. Normalizing handles this gracefully: existing data loads correctly, and the"workflow"semantic is the safest fallback since it ties data lifecycle to something concrete (the workflow run) rather than deleting it immediately.Lifetime + GC interaction (answering @adamhjk's question)
Lifetime and garbage collection are orthogonal mechanisms:
"workflow"lifetime means the data expires when the associated workflow run is deleted (checked viaisExpired()→workflowRunRepo.findById()). It does not use time-based expiration —calculateExpiration("workflow", ...)returnsnull.gc: 1means only the 1 most recent version is kept; older versions are hard-deleted.With
lifetime: "workflow"andgc: 1:isExpired()returnstrue→ the "latest" symlink is removed (soft delete, data becomes inaccessible). Then GC hard-deletes old versions, leaving at most 1 version on disk.No conflict — they compose naturally.
Changes
src/domain/data/data_metadata.tsnormalizeLifetime()helper; add.refine()toGarbageCollectionSchemarejecting zero-duration stringssrc/domain/data/data.tsnormalizeLifetime()inData.create()andData.fromData()before Zod validationsrc/domain/data/data_metadata_test.tsnormalizeLifetime()andGarbageCollectionSchemasrc/domain/data/data_test.tsData.create(),Data.fromData(), and roundtripsrc/domain/data/data_lifecycle_service_test.tsnullexpiration (same as"workflow")Test plan
deno checkpassesdeno lintpassesdeno fmtpassesCloses #338
🤖 Generated with Claude Code