refactor: extract SyncStorage interface for async backends#75
Conversation
- SyncStorage interface abstracts DB I/O (getFileHash, setFileHash, getDirHash, etc.) - SqliteSyncStorage wraps existing better-sqlite3 calls (backward compatible) - computeChanges and persistMerkleState now accept optional SyncStorage parameter - Pure Merkle logic (buildMerkleTree, hashFiles) extracted and exported - All sync functions are now async (returns Promise) - All 10 sync tests pass with no modifications to test logic Closes #70
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThe changes introduce a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
packages/core/src/lib/sync.ts (3)
86-100: Transaction safety concern with async operations inside synchronous SQLite transaction.The comment acknowledges that better-sqlite3 transactions are synchronous, but the current implementation may not provide true atomicity guarantees. If
fn()contains multipleawaitcalls (as it does inpersistMerkleState), other code could potentially execute database operations between the microtasks, since JavaScript's event loop may interleave other tasks.For the current single-threaded CLI use case this is likely acceptable, but for robustness, consider using better-sqlite3's native
db.transaction()wrapper which ensures atomicity:♻️ Alternative using better-sqlite3 native transaction
async transaction<T>(fn: () => Promise<T>): Promise<T> { - // better-sqlite3 transactions are synchronous. Since SqliteSyncStorage - // methods wrap sync calls in async, all awaits resolve in the same - // microtask. We use manual BEGIN/COMMIT to support the async fn. - const db = getDb(); - db.exec('BEGIN'); - try { - const result = await fn(); - db.exec('COMMIT'); - return result; - } catch (err) { - db.exec('ROLLBACK'); - throw err; - } + // Since SqliteSyncStorage methods are thin async wrappers over sync calls, + // all awaits resolve immediately. The result is effectively synchronous. + // For true atomicity with better-sqlite3, we could batch operations, + // but for CLI single-threaded use, manual BEGIN/COMMIT suffices. + const db = getDb(); + db.exec('BEGIN IMMEDIATE'); + try { + const result = await fn(); + db.exec('COMMIT'); + return result; + } catch (err) { + db.exec('ROLLBACK'); + throw err; + } }Using
BEGIN IMMEDIATEinstead ofBEGINacquires a write lock immediately, preventing other connections from writing during the transaction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/lib/sync.ts` around lines 86 - 100, The transaction implementation in transaction() uses manual 'BEGIN'/'COMMIT' which can be interleaved by async microtasks and doesn't guarantee atomicity; replace it with better-sqlite3's native transaction wrapper or at minimum use 'BEGIN IMMEDIATE' to acquire the write lock immediately: get the DB via getDb(), create a synchronous transaction function via db.transaction(fnSync) and invoke it (or change db.exec('BEGIN') to db.exec('BEGIN IMMEDIATE')) so that calls like persistMerkleState are executed atomically without other DB operations interleaving.
305-311: Redundant re-read of file hashes within the same transaction.The code writes file hashes (lines 296-301), then immediately re-reads them from storage (lines 306-311) to build
persistedHashMap. Since both operations occur in the same transaction, you already know which files were successfully persisted—they're the ones insuccessfulFileswith a hash infileHashMap.♻️ Avoid redundant storage reads
- // Rebuild tree using only files with persisted hashes to avoid - // caching dir hashes that include failed files - const persistedHashMap = new Map<string, string>(); - for (const file of files) { - const stored = await storage.getFileHash(file); - if (stored) { - persistedHashMap.set(file, stored); - } - } + // Rebuild tree using only files with persisted hashes to avoid + // caching dir hashes that include failed files. + // Use the hashes we just wrote (successfulFiles) plus any previously + // stored hashes for files not in this batch. + const persistedHashMap = new Map<string, string>(); + for (const file of files) { + if (successfulFiles.has(file)) { + const hash = fileHashMap.get(file); + if (hash) persistedHashMap.set(file, hash); + } else { + const stored = await storage.getFileHash(file); + if (stored) persistedHashMap.set(file, stored); + } + }This reduces storage reads from
files.lengthtofiles.length - successfulFiles.size, which could be significant for large file sets.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/lib/sync.ts` around lines 305 - 311, The code redundantly re-reads hashes from storage into persistedHashMap by calling storage.getFileHash for every file; instead, for files written in the same transaction use the already-known hashes from fileHashMap for those in successfulFiles and only call storage.getFileHash for files not in successfulFiles. Update the loop that builds persistedHashMap to: iterate files, if file is in successfulFiles and has an entry in fileHashMap use that hash, otherwise await storage.getFileHash(file) and set if present; reference persistedHashMap, files, successfulFiles, fileHashMap and storage.getFileHash when making the change.
253-264: Sequential awaits in hot path may impact performance with async backends.Each file hash lookup is awaited sequentially inside the nested loop. For
SqliteSyncStoragethis resolves immediately, but a true async backend (like the plannedDrizzleSyncStorage) could incur network latency per call, making this O(n) in round-trips.Consider batching lookups or prefetching hashes for changed directories:
💡 Optimization idea for async backends
// For async backends, consider prefetching all file hashes for a directory // or adding a batch method to SyncStorage: // getFileHashes(filePaths: string[]): Promise<Map<string, string | null>>This is acceptable for Phase 2 with SQLite, but worth noting for the DrizzleSyncStorage implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/lib/sync.ts` around lines 253 - 264, The loop in sync.ts iterates dirFiles and awaits storage.getFileHash(file) sequentially, causing many round-trips for async backends; update the SyncStorage interface to add a batched method (e.g., getFileHashes(filePaths: string[]): Promise<Map<string,string|null>>) and change the logic around dirFiles / fileHashMap to call that batch method once per dir, then compare returned hashes to currentHash from fileHashMap and populate added and modified accordingly; ensure SqliteSyncStorage implements the new batch method (delegating to the fast path) so behavior is unchanged for SQLite while DrizzleSyncStorage can implement a single efficient query.packages/core/src/lib/sync.test.ts (1)
152-169: Consider adding a test for custom SyncStorage implementations.The test suite validates
SqliteSyncStoragebehavior (via default parameter), but doesn't test the interface contract itself. Adding a test with a mockSyncStoragewould help validate that custom implementations (like the plannedDrizzleSyncStorage) will work correctly.💡 Example mock storage test
it('works with custom SyncStorage implementation', async () => { const mockStorage: SyncStorage = { getFileHash: vi.fn().mockResolvedValue(null), setFileHash: vi.fn().mockResolvedValue(undefined), getDirHash: vi.fn().mockResolvedValue(null), setDirHash: vi.fn().mockResolvedValue(undefined), clearDirHashes: vi.fn().mockResolvedValue(undefined), getAllFileHashes: vi.fn().mockResolvedValue(new Map()), transaction: vi.fn().mockImplementation((fn) => fn()), }; await fs.writeFile(path.join(tmpDir, 'test.ts'), 'const x = 1;'); const files = [path.join(tmpDir, 'test.ts')]; const result = await computeChanges(files, tmpDir, mockStorage); expect(result.added).toHaveLength(1); expect(mockStorage.getDirHash).toHaveBeenCalled(); });This can be deferred to when
DrizzleSyncStorageis implemented (Step 3).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/lib/sync.test.ts` around lines 152 - 169, Add a unit test that exercises the SyncStorage interface by passing a mock implementation into computeChanges (instead of relying on the default SqliteSyncStorage) and asserting behavior and calls; specifically, create a mock SyncStorage object implementing getFileHash, setFileHash, getDirHash, setDirHash, clearDirHashes, getAllFileHashes, and transaction (transaction should invoke the provided callback), write a test file in tmpDir, call computeChanges(files, tmpDir, mockStorage) and assert expected results (e.g., added length) and that mockStorage.getDirHash / getFileHash / setFileHash (or other pertinent methods) were called to validate custom storage integrations (refer to computeChanges and persistMerkleState to locate where the storage is used).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/core/src/lib/sync.test.ts`:
- Around line 152-169: Add a unit test that exercises the SyncStorage interface
by passing a mock implementation into computeChanges (instead of relying on the
default SqliteSyncStorage) and asserting behavior and calls; specifically,
create a mock SyncStorage object implementing getFileHash, setFileHash,
getDirHash, setDirHash, clearDirHashes, getAllFileHashes, and transaction
(transaction should invoke the provided callback), write a test file in tmpDir,
call computeChanges(files, tmpDir, mockStorage) and assert expected results
(e.g., added length) and that mockStorage.getDirHash / getFileHash / setFileHash
(or other pertinent methods) were called to validate custom storage integrations
(refer to computeChanges and persistMerkleState to locate where the storage is
used).
In `@packages/core/src/lib/sync.ts`:
- Around line 86-100: The transaction implementation in transaction() uses
manual 'BEGIN'/'COMMIT' which can be interleaved by async microtasks and doesn't
guarantee atomicity; replace it with better-sqlite3's native transaction wrapper
or at minimum use 'BEGIN IMMEDIATE' to acquire the write lock immediately: get
the DB via getDb(), create a synchronous transaction function via
db.transaction(fnSync) and invoke it (or change db.exec('BEGIN') to
db.exec('BEGIN IMMEDIATE')) so that calls like persistMerkleState are executed
atomically without other DB operations interleaving.
- Around line 305-311: The code redundantly re-reads hashes from storage into
persistedHashMap by calling storage.getFileHash for every file; instead, for
files written in the same transaction use the already-known hashes from
fileHashMap for those in successfulFiles and only call storage.getFileHash for
files not in successfulFiles. Update the loop that builds persistedHashMap to:
iterate files, if file is in successfulFiles and has an entry in fileHashMap use
that hash, otherwise await storage.getFileHash(file) and set if present;
reference persistedHashMap, files, successfulFiles, fileHashMap and
storage.getFileHash when making the change.
- Around line 253-264: The loop in sync.ts iterates dirFiles and awaits
storage.getFileHash(file) sequentially, causing many round-trips for async
backends; update the SyncStorage interface to add a batched method (e.g.,
getFileHashes(filePaths: string[]): Promise<Map<string,string|null>>) and change
the logic around dirFiles / fileHashMap to call that batch method once per dir,
then compare returned hashes to currentHash from fileHashMap and populate added
and modified accordingly; ensure SqliteSyncStorage implements the new batch
method (delegating to the fast path) so behavior is unchanged for SQLite while
DrizzleSyncStorage can implement a single efficient query.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 03f5496b-3326-46db-afc5-591ef6c39786
📒 Files selected for processing (3)
packages/core/src/index.tspackages/core/src/lib/sync.test.tspackages/core/src/lib/sync.ts
Summary
Extracts a
SyncStorageinterface fromsync.tsso the Merkle sync logic works with both SQLite (CLI) and async Drizzle/Neon (cloud tasks).Changes
SyncStorageinterface —getFileHash,setFileHash,getDirHash,setDirHash,clearDirHashes,getAllFileHashes,transactionSqliteSyncStorageclass — wraps existing better-sqlite3 calls, backward compatiblecomputeChanges+persistMerkleState— now accept optionalSyncStorageparameter (defaults toSqliteSyncStorage)buildMerkleTree,hashFilesfor reusePromisefor compatibility with async backendsTest plan
pnpm typecheckpassesCloses #70
Summary by CodeRabbit
New Features
Breaking Changes
persistMerkleStateis now asynchronous and requiresawaitwhen called.