refactor: unify data access — make data.latest() the canonical accessor#453
Merged
refactor: unify data access — make data.latest() the canonical accessor#453
Conversation
Remove the DataCache class and replace it with direct synchronous disk reads via Deno's sync filesystem APIs. This makes `data.latest()` always return fresh on-disk state instead of reading from a snapshot taken at buildContext() time. The `model.*.resource` and `model.*.file` CEL patterns are deprecated with warnings, but still work for backward compatibility. ## Why this change The DataCache was an implementation accident, not a design choice. It existed solely because CEL evaluation is synchronous but all UnifiedDataRepository methods were async. Deno provides sync filesystem APIs (readTextFileSync, readLinkSync, readDirSync, readFileSync), making the cache unnecessary. Before this change, users had to reason about "which accessor sees what data when" — `data.latest()` read from a cache snapshot, and `model.*.resource` was eagerly populated. After this change, `data.latest()` always means "read the latest from disk" with no hidden staleness or snapshot timing to reason about. Alpha is the right time for this change. The user base is small, and this kind of simplification only gets harder later. The alternative — keeping the cache but updating it after each step — would leave unnecessary abstraction in place and make the code harder to reason about. ## What changed ### Core implementation - **Removed `DataCache` class** from `model_resolver.ts` — the unnecessary abstraction that bridged sync CEL with async repo methods - **Added `ModelCoordinates` interface and `ModelCoordinatesMap` type** — maps model names to disk coordinates (modelType + modelId), supporting orphan recovery when models are deleted and recreated with new UUIDs - **Added 5 sync methods** to `UnifiedDataRepository` interface and `FileSystemUnifiedDataRepository`: - `getLatestVersionSync` — reads `latest` symlink, falls back to dir scan - `findByNameSync` — resolves version, reads metadata.yaml - `listVersionsSync` — scans version directories - `getContentSync` — reads raw content file - `findAllForModelSync` — scans all data names for a model - **Rewrote `data.*` CEL namespace functions** to use sync disk reads instead of cache lookups - **Fixed `findByTag` deduplication** — prevents duplicate records when data exists under both current and orphan coordinates - **Fixed `findByTag`/`findBySpec` to scan all versions** — not just latest ### Deprecation - **Added deprecation warnings** in `CelEvaluator` for `model.*.resource` and `model.*.file` patterns via LogTape logger with deduplication - `model.*.resource` and `model.*.file` are still eagerly populated in `buildContext()` for backward compatibility — they work but emit warnings - `model.*.input`, `model.*.definition`, `model.*.execution` are NOT deprecated — these are model metadata, not versioned data ### Documentation - Updated `design/expressions.md` and `design/vaults.md` — examples use `data.latest()` instead of deprecated patterns - Updated `repo_service.ts` CLAUDE.md guidance — now recommends `data.latest()` over `model.*.resource` - Updated skill references (swamp-model, swamp-workflow) — marked `data.latest()` as preferred, `model.*.resource` as deprecated ### Tests - Rewrote `model_resolver_test.ts` — removed DataCache tests, added 8+ sync behavior tests including deduplication and post-buildContext freshness - Added 8 sync method tests to `unified_data_repository_test.ts` - Added 3 deprecation warning tests to `cel_evaluator_test.ts` - Added integration test verifying `data.latest()` sees data written after `buildContext()` - Fixed 6 mock files to implement new sync interface methods ## Plan vs implementation All 5 planned steps were fully implemented. Three unplanned items were discovered and fixed during testing: 1. 6 mock `UnifiedDataRepository` implementations needed sync method stubs 2. `findByTag`/`findBySpec` initially only scanned latest version — fixed to scan all versions 3. `findByTag` needed deduplication logic for orphan coordinate overlap ## Known trade-off `findByTag()` and `findBySpec()` scan all metadata files synchronously on every call — O(total_data_items). This is acceptable for alpha: these functions are rarely used, the data is local filesystem, and if it becomes a problem the fix (lightweight tag index during `findAllGlobal()`) is well-understood. ## User impact - **`data.latest()` now always returns fresh data** — no more stale snapshots from buildContext() time - **`model.*.resource`/`model.*.file` still work** but emit deprecation warnings; users should migrate to `data.latest()` - **No breaking changes** — all existing workflows and models continue to work - **Vary dimensions fully supported** via `data.latest()` 3-argument form ## Verification - `deno check` — passes - `deno lint` — passes - `deno fmt` — passes - `deno run test` — 1926 tests, 0 failures - `deno run compile` — binary compiles successfully Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
✅ Approved
This is a well-structured refactoring that simplifies data access by removing the DataCache abstraction and replacing it with direct synchronous disk reads.
Code Quality & CLAUDE.md Compliance ✓
- TypeScript strict mode: No
anytypes found - Named exports: All exports use named exports
- AGPLv3 headers: Present on all modified files
- Test coverage: Comprehensive unit tests (8+ sync method tests in
unified_data_repository_test.ts, rewrittenmodel_resolver_test.ts, 3 deprecation tests incel_evaluator_test.ts) and integration tests verifyingdata.latest()freshness
Domain-Driven Design ✓
ModelCoordinatesandModelCoordinatesMapare proper value objects (no identity, equality by value)UnifiedDataRepositorycorrectly abstracts persistence as a Repository for the Data aggregateModelResolveracts as a domain service for resolving model references- Sync methods are correctly added to the repository interface, not mixed into domain logic
Architecture ✓
- Clean removal of unnecessary
DataCacheabstraction data.latest()now always returns fresh on-disk state (no staleness issues)- Backward compatibility maintained via deprecation warnings for
model.*.resourceandmodel.*.file - Error handling in sync methods is consistent with existing async patterns (catches
NotFound, rethrows others)
Security ✓
- No security vulnerabilities identified
- All file operations are read-only
- Proper YAML parsing via existing
parseYamlutility
Suggestions (non-blocking)
-
Inline O(n) documentation: Consider adding a code comment in
findByTag/findBySpecnoting the O(total_data_items) scan characteristic (currently only in PR description) -
Minor encapsulation change:
getDataNameDirandgetMetadataPathchanged from private to public — acceptable for the implementation, but worth noting for future maintainers
The known trade-off (O(n) scan in findByTag/findBySpec) is appropriate for alpha as documented in the PR.
🤖 Generated with Claude Code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #434
Summary
DataCacheclass and replace with direct synchronous disk reads via Deno's sync filesystem APIsdata.latest()now always returns fresh on-disk state instead of reading from a snapshot taken atbuildContext()timemodel.*.resourceandmodel.*.fileCEL patterns are deprecated with warnings but still work for backward compatibilityWhy this is the right move
The DataCache was an implementation accident, not a design choice. It existed solely because CEL evaluation is synchronous but all
UnifiedDataRepositorymethods were async. Deno provides sync filesystem APIs (readTextFileSync,readLinkSync,readDirSync,readFileSync), making the cache unnecessary complexity.Simplicity of reasoning. Before this change, users and contributors had to think about "which accessor sees what data when" —
data.latest()read from a cache snapshot taken atbuildContext()time, andmodel.*.resourcewas eagerly populated from afindAllGlobal()scan. After this change,data.latest()always means "read the latest from disk" — no hidden staleness, no snapshot timing to reason about. The behavior is obvious.Alpha is the right time. The user base is small,
model.*.resourceis embedded in docs/skills/examples, and this kind of simplification only gets harder later. We keep backward compatibility with deprecation warnings so nothing breaks immediately.The alternative is worse. Keeping the cache but updating it after each workflow step would leave unnecessary abstraction in place, add update logic, and make the code harder to reason about ("is the cache up to date? when was it last refreshed?"). Removing it entirely is cleaner.
What changed
Core implementation
DataCacheclass frommodel_resolver.ts— the unnecessary abstraction that bridged sync CEL with async repo methodsModelCoordinatesinterface andModelCoordinatesMaptype — maps model names to disk coordinates (modelType + modelId), supporting orphan recovery when models are deleted and recreated with new UUIDsUnifiedDataRepositoryinterface andFileSystemUnifiedDataRepository:getLatestVersionSync— readslatestsymlink, falls back to dir scanfindByNameSync— resolves version, reads metadata.yamllistVersionsSync— scans version directoriesgetContentSync— reads raw content filefindAllForModelSync— scans all data names for a modeldata.*CEL namespace functions to use sync disk reads instead of cache lookupsfindByTagdeduplication — prevents duplicate records when data exists under both current and orphan coordinatesfindByTag/findBySpecto scan all versions — not just latestDeprecation
CelEvaluatorformodel.*.resourceandmodel.*.filepatterns via LogTape logger with deduplicationmodel.*.resourceandmodel.*.fileare still eagerly populated inbuildContext()for backward compatibility — they work but emit warningsmodel.*.input,model.*.definition,model.*.executionare NOT deprecated — these are model metadata, not versioned dataDocumentation
design/expressions.mdanddesign/vaults.md— examples usedata.latest()instead of deprecated patternsrepo_service.tsCLAUDE.md guidance — now recommendsdata.latest()overmodel.*.resourcedata.latest()as preferred,model.*.resourceas deprecatedTests
model_resolver_test.ts— removed DataCache tests, added 8+ sync behavior tests including deduplication and post-buildContext freshnessunified_data_repository_test.tscel_evaluator_test.tsdata.latest()sees data written afterbuildContext()Plan vs implementation
All 5 planned steps were fully implemented as designed:
Three unplanned items were discovered and fixed during testing:
UnifiedDataRepositoryimplementations needed sync method stubs to pass type checkingfindByTag/findBySpecinitially only scanned latest version — fixed to scan all versionsfindByTagneeded deduplication logic to prevent duplicate records when data exists under both current and orphan coordinatesKnown trade-off
findByTag()andfindBySpec()scan all metadata files synchronously on every call — O(total_data_items). This is acceptable for alpha: these functions are rarely used compared todata.latest(), the data is local filesystem (not network I/O), and if performance becomes a problem the fix (lightweight tag index duringfindAllGlobal()) is well-understood.User impact
data.latest()now always returns fresh data — no more stale snapshots frombuildContext()time. If step A writes data, step B'sdata.latest()sees it immediately.model.*.resource/model.*.filestill work but emit deprecation warnings; users should migrate todata.latest()data.latest()3-argument formmodel.<name>.resource.<spec>.<instance>.attributes.<field>withdata.latest("<name>", "<dataName>").attributes.<field>Test plan
deno check— passesdeno lint— passesdeno fmt— passesdeno run test— 1926 tests, 0 failuresdeno run compile— binary compiles successfullycel_data_access— 20/20 passdata_tagging— 13/13 pass🤖 Generated with Claude Code