refactor: extract SyncStorage interface for async backends by HrushiBorhade · Pull Request #75 · HrushiBorhade/code-indexer

HrushiBorhade · 2026-03-24T11:06:50Z

Summary

Extracts a SyncStorage interface from sync.ts so the Merkle sync logic works with both SQLite (CLI) and async Drizzle/Neon (cloud tasks).

Changes

SyncStorage interface — getFileHash, setFileHash, getDirHash, setDirHash, clearDirHashes, getAllFileHashes, transaction
SqliteSyncStorage class — wraps existing better-sqlite3 calls, backward compatible
computeChanges + persistMerkleState — now accept optional SyncStorage parameter (defaults to SqliteSyncStorage)
Pure Merkle logic exported — buildMerkleTree, hashFiles for reuse
All functions now async — returns Promise for compatibility with async backends

Test plan

All 10 sync tests pass
pnpm typecheck passes
Backward compatible — existing CLI works unchanged (defaults to SqliteSyncStorage)
DrizzleSyncStorage implementation (Step 3, in apps/trigger)

Closes #70

Summary by CodeRabbit

New Features
- Extended the public API with new utilities for building Merkle trees and hashing files.
- Added configurable storage abstraction enabling custom storage implementations.
Breaking Changes
- persistMerkleState is now asynchronous and requires await when called.

- SyncStorage interface abstracts DB I/O (getFileHash, setFileHash, getDirHash, etc.) - SqliteSyncStorage wraps existing better-sqlite3 calls (backward compatible) - computeChanges and persistMerkleState now accept optional SyncStorage parameter - Pure Merkle logic (buildMerkleTree, hashFiles) extracted and exported - All sync functions are now async (returns Promise) - All 10 sync tests pass with no modifications to test logic Closes #70

vercel · 2026-03-24T11:06:55Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
code-indexer-web	Ready	Preview, Comment	Mar 24, 2026 11:06am

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-03-24T11:07:30Z

📝 Walkthrough

Walkthrough

The changes introduce a SyncStorage interface abstracting database persistence operations for file and directory hashes, replacing direct SQLite calls with dependency injection. SqliteSyncStorage implements this interface for CLI usage. Core functions computeChanges() and persistMerkleState() now accept optional storage parameters and operate asynchronously via promises.

Changes

Cohort / File(s)	Summary
Storage Abstraction Layer `packages/core/src/lib/sync.ts`	Introduced `SyncStorage` interface defining async methods for hash persistence (`getFileHash`, `setFileHash`, `getDirHash`, `setDirHash`, `getAllFileHashes`, `transaction`). Added `SqliteSyncStorage` implementation wrapping existing SQLite operations. Refactored `computeChanges()` and `persistMerkleState()` to accept optional `storage` parameter and converted to async/await patterns.
Test Updates `packages/core/src/lib/sync.test.ts`	Added `await` to all `persistMerkleState()` calls to properly handle async control flow in test assertions.
Public API Expansion `packages/core/src/index.ts`	Extended named exports with `buildMerkleTree`, `hashFiles`, `SqliteSyncStorage`. Updated type exports to include `SyncStorage` and `MerkleTree` alongside `SyncResult`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop! The storage interface now sets us free,
No sync-locked bounds, async you see!
SqliteStorage hops with the CLI,
While Drizzle awaits in the cloud up high—
One pattern, many backends, pure glee! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main refactoring objective: extracting a SyncStorage interface to support async backends, which aligns with the primary changeset across sync.ts, tests, and exports.
Linked Issues check	✅ Passed	All coding requirements from issue `#70` are met: SyncStorage interface defined, pure Merkle logic (computeChanges, persistMerkleState) extracted, SqliteSyncStorage implemented, sync.ts refactored to accept SyncStorage parameter, exports updated, and all tests pass.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to requirements: new interface definition, function refactoring, export updates, and test adjustments for async behavior. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch phase-2/sync-storage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

packages/core/src/lib/sync.ts (3)

86-100: Transaction safety concern with async operations inside synchronous SQLite transaction.

The comment acknowledges that better-sqlite3 transactions are synchronous, but the current implementation may not provide true atomicity guarantees. If fn() contains multiple await calls (as it does in persistMerkleState), other code could potentially execute database operations between the microtasks, since JavaScript's event loop may interleave other tasks.

For the current single-threaded CLI use case this is likely acceptable, but for robustness, consider using better-sqlite3's native db.transaction() wrapper which ensures atomicity:

♻️ Alternative using better-sqlite3 native transaction

  async transaction<T>(fn: () => Promise<T>): Promise<T> {
-   // better-sqlite3 transactions are synchronous. Since SqliteSyncStorage
-   // methods wrap sync calls in async, all awaits resolve in the same
-   // microtask. We use manual BEGIN/COMMIT to support the async fn.
-   const db = getDb();
-   db.exec('BEGIN');
-   try {
-     const result = await fn();
-     db.exec('COMMIT');
-     return result;
-   } catch (err) {
-     db.exec('ROLLBACK');
-     throw err;
-   }
+   // Since SqliteSyncStorage methods are thin async wrappers over sync calls,
+   // all awaits resolve immediately. The result is effectively synchronous.
+   // For true atomicity with better-sqlite3, we could batch operations,
+   // but for CLI single-threaded use, manual BEGIN/COMMIT suffices.
+   const db = getDb();
+   db.exec('BEGIN IMMEDIATE');
+   try {
+     const result = await fn();
+     db.exec('COMMIT');
+     return result;
+   } catch (err) {
+     db.exec('ROLLBACK');
+     throw err;
+   }
  }

Using BEGIN IMMEDIATE instead of BEGIN acquires a write lock immediately, preventing other connections from writing during the transaction.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/lib/sync.ts` around lines 86 - 100, The transaction
implementation in transaction() uses manual 'BEGIN'/'COMMIT' which can be
interleaved by async microtasks and doesn't guarantee atomicity; replace it with
better-sqlite3's native transaction wrapper or at minimum use 'BEGIN IMMEDIATE'
to acquire the write lock immediately: get the DB via getDb(), create a
synchronous transaction function via db.transaction(fnSync) and invoke it (or
change db.exec('BEGIN') to db.exec('BEGIN IMMEDIATE')) so that calls like
persistMerkleState are executed atomically without other DB operations
interleaving.

305-311: Redundant re-read of file hashes within the same transaction.

The code writes file hashes (lines 296-301), then immediately re-reads them from storage (lines 306-311) to build persistedHashMap. Since both operations occur in the same transaction, you already know which files were successfully persisted—they're the ones in successfulFiles with a hash in fileHashMap.

♻️ Avoid redundant storage reads

-   // Rebuild tree using only files with persisted hashes to avoid
-   // caching dir hashes that include failed files
-   const persistedHashMap = new Map<string, string>();
-   for (const file of files) {
-     const stored = await storage.getFileHash(file);
-     if (stored) {
-       persistedHashMap.set(file, stored);
-     }
-   }
+   // Rebuild tree using only files with persisted hashes to avoid
+   // caching dir hashes that include failed files.
+   // Use the hashes we just wrote (successfulFiles) plus any previously
+   // stored hashes for files not in this batch.
+   const persistedHashMap = new Map<string, string>();
+   for (const file of files) {
+     if (successfulFiles.has(file)) {
+       const hash = fileHashMap.get(file);
+       if (hash) persistedHashMap.set(file, hash);
+     } else {
+       const stored = await storage.getFileHash(file);
+       if (stored) persistedHashMap.set(file, stored);
+     }
+   }

This reduces storage reads from files.length to files.length - successfulFiles.size, which could be significant for large file sets.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/lib/sync.ts` around lines 305 - 311, The code redundantly
re-reads hashes from storage into persistedHashMap by calling
storage.getFileHash for every file; instead, for files written in the same
transaction use the already-known hashes from fileHashMap for those in
successfulFiles and only call storage.getFileHash for files not in
successfulFiles. Update the loop that builds persistedHashMap to: iterate files,
if file is in successfulFiles and has an entry in fileHashMap use that hash,
otherwise await storage.getFileHash(file) and set if present; reference
persistedHashMap, files, successfulFiles, fileHashMap and storage.getFileHash
when making the change.

253-264: Sequential awaits in hot path may impact performance with async backends.

Each file hash lookup is awaited sequentially inside the nested loop. For SqliteSyncStorage this resolves immediately, but a true async backend (like the planned DrizzleSyncStorage) could incur network latency per call, making this O(n) in round-trips.

Consider batching lookups or prefetching hashes for changed directories:

💡 Optimization idea for async backends

// For async backends, consider prefetching all file hashes for a directory
// or adding a batch method to SyncStorage:
// getFileHashes(filePaths: string[]): Promise<Map<string, string | null>>

This is acceptable for Phase 2 with SQLite, but worth noting for the DrizzleSyncStorage implementation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/lib/sync.ts` around lines 253 - 264, The loop in sync.ts
iterates dirFiles and awaits storage.getFileHash(file) sequentially, causing
many round-trips for async backends; update the SyncStorage interface to add a
batched method (e.g., getFileHashes(filePaths: string[]):
Promise<Map<string,string|null>>) and change the logic around dirFiles /
fileHashMap to call that batch method once per dir, then compare returned hashes
to currentHash from fileHashMap and populate added and modified accordingly;
ensure SqliteSyncStorage implements the new batch method (delegating to the fast
path) so behavior is unchanged for SQLite while DrizzleSyncStorage can implement
a single efficient query.

packages/core/src/lib/sync.test.ts (1)

152-169: Consider adding a test for custom SyncStorage implementations.

The test suite validates SqliteSyncStorage behavior (via default parameter), but doesn't test the interface contract itself. Adding a test with a mock SyncStorage would help validate that custom implementations (like the planned DrizzleSyncStorage) will work correctly.

💡 Example mock storage test

it('works with custom SyncStorage implementation', async () => {
  const mockStorage: SyncStorage = {
    getFileHash: vi.fn().mockResolvedValue(null),
    setFileHash: vi.fn().mockResolvedValue(undefined),
    getDirHash: vi.fn().mockResolvedValue(null),
    setDirHash: vi.fn().mockResolvedValue(undefined),
    clearDirHashes: vi.fn().mockResolvedValue(undefined),
    getAllFileHashes: vi.fn().mockResolvedValue(new Map()),
    transaction: vi.fn().mockImplementation((fn) => fn()),
  };

  await fs.writeFile(path.join(tmpDir, 'test.ts'), 'const x = 1;');
  const files = [path.join(tmpDir, 'test.ts')];
  
  const result = await computeChanges(files, tmpDir, mockStorage);
  expect(result.added).toHaveLength(1);
  expect(mockStorage.getDirHash).toHaveBeenCalled();
});

This can be deferred to when DrizzleSyncStorage is implemented (Step 3).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/lib/sync.test.ts` around lines 152 - 169, Add a unit test
that exercises the SyncStorage interface by passing a mock implementation into
computeChanges (instead of relying on the default SqliteSyncStorage) and
asserting behavior and calls; specifically, create a mock SyncStorage object
implementing getFileHash, setFileHash, getDirHash, setDirHash, clearDirHashes,
getAllFileHashes, and transaction (transaction should invoke the provided
callback), write a test file in tmpDir, call computeChanges(files, tmpDir,
mockStorage) and assert expected results (e.g., added length) and that
mockStorage.getDirHash / getFileHash / setFileHash (or other pertinent methods)
were called to validate custom storage integrations (refer to computeChanges and
persistMerkleState to locate where the storage is used).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/core/src/lib/sync.test.ts`:
- Around line 152-169: Add a unit test that exercises the SyncStorage interface
by passing a mock implementation into computeChanges (instead of relying on the
default SqliteSyncStorage) and asserting behavior and calls; specifically,
create a mock SyncStorage object implementing getFileHash, setFileHash,
getDirHash, setDirHash, clearDirHashes, getAllFileHashes, and transaction
(transaction should invoke the provided callback), write a test file in tmpDir,
call computeChanges(files, tmpDir, mockStorage) and assert expected results
(e.g., added length) and that mockStorage.getDirHash / getFileHash / setFileHash
(or other pertinent methods) were called to validate custom storage integrations
(refer to computeChanges and persistMerkleState to locate where the storage is
used).

In `@packages/core/src/lib/sync.ts`:
- Around line 86-100: The transaction implementation in transaction() uses
manual 'BEGIN'/'COMMIT' which can be interleaved by async microtasks and doesn't
guarantee atomicity; replace it with better-sqlite3's native transaction wrapper
or at minimum use 'BEGIN IMMEDIATE' to acquire the write lock immediately: get
the DB via getDb(), create a synchronous transaction function via
db.transaction(fnSync) and invoke it (or change db.exec('BEGIN') to
db.exec('BEGIN IMMEDIATE')) so that calls like persistMerkleState are executed
atomically without other DB operations interleaving.
- Around line 305-311: The code redundantly re-reads hashes from storage into
persistedHashMap by calling storage.getFileHash for every file; instead, for
files written in the same transaction use the already-known hashes from
fileHashMap for those in successfulFiles and only call storage.getFileHash for
files not in successfulFiles. Update the loop that builds persistedHashMap to:
iterate files, if file is in successfulFiles and has an entry in fileHashMap use
that hash, otherwise await storage.getFileHash(file) and set if present;
reference persistedHashMap, files, successfulFiles, fileHashMap and
storage.getFileHash when making the change.
- Around line 253-264: The loop in sync.ts iterates dirFiles and awaits
storage.getFileHash(file) sequentially, causing many round-trips for async
backends; update the SyncStorage interface to add a batched method (e.g.,
getFileHashes(filePaths: string[]): Promise<Map<string,string|null>>) and change
the logic around dirFiles / fileHashMap to call that batch method once per dir,
then compare returned hashes to currentHash from fileHashMap and populate added
and modified accordingly; ensure SqliteSyncStorage implements the new batch
method (delegating to the fast path) so behavior is unchanged for SQLite while
DrizzleSyncStorage can implement a single efficient query.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03f5496b-3326-46db-afc5-591ef6c39786

📥 Commits

Reviewing files that changed from the base of the PR and between d172d99 and 6b5941d.

📒 Files selected for processing (3)

packages/core/src/index.ts
packages/core/src/lib/sync.test.ts
packages/core/src/lib/sync.ts

HrushiBorhade added phase-2 Phase 2: Indexing Pipeline indexing Indexing pipeline tasks labels Mar 24, 2026

greptile-apps Bot reviewed Mar 24, 2026

View reviewed changes

coderabbitai Bot reviewed Mar 24, 2026

View reviewed changes

HrushiBorhade merged commit 6bcd506 into main Mar 24, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: extract SyncStorage interface for async backends#75

refactor: extract SyncStorage interface for async backends#75
HrushiBorhade merged 1 commit into
mainfrom
phase-2/sync-storage

HrushiBorhade commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 24, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HrushiBorhade commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 24, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HrushiBorhade commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading