Skip to content

Conversation

@tylerhawkes
Copy link
Contributor

@tylerhawkes tylerhawkes commented Nov 21, 2025

Add database-backed group locks and GroupGuard to manage exclusive access with a 5-minute expiry cap in group_lock.rs

Introduce group_locks table via migrations, implement QueryGroupLock for ConnectionExt and &mut SqliteConnection, and add GroupGuard that acquires and deletes locks on drop; include tests and schema definitions.

📍Where to Start

Start with the GroupGuard::acquire and trait methods in group_lock.rs.


Macroscope summarized c1a59b8.

@claude
Copy link

claude bot commented Nov 21, 2025

Claude finished @tylerhawkes's task —— View job


Code Review Complete

Review Date: 2025-11-21 08:36:00 UTC

Critical Issues

1. Lock ordering issue in release_inner (xmtp_db/src/encrypted_store/group_lock.rs:179-186)

The release_inner method clears self.lock after attempting deletion. If the deletion fails or returns 0 rows, the in-memory lock is cleared but the DB lock may remain. This creates a critical issue where:

  • The Drop implementation won't retry cleanup (because self.lock is None)
  • The lock stays in the database indefinitely
  • The group becomes permanently locked
// Current (problematic):
self.conn.delete_group_lock(&lock.group_id, lock.locked_at_ns)?;
self.lock = None;  // Lock cleared even if deletion affects 0 rows

// Should be:
let deleted = self.conn.delete_group_lock(&lock.group_id, lock.locked_at_ns)?;
if deleted > 0 {
    self.lock = None;
} else {
    return Err(StorageError::...); // or keep lock for Drop to retry
}

2. Potential integer overflow in time calculation (xmtp_db/src/encrypted_store/group_lock.rs:62, 111)

Using .min() before casting to i64 doesn't prevent all overflow scenarios:

expires_in.as_nanos().min(NS_IN_MIN as u128 * 5) as i64

If expires_in.as_nanos() returns a value > i64::MAX, the as i64 cast will silently wrap. Use checked arithmetic:

let capped_ns = expires_in.as_nanos().min(NS_IN_MIN as u128 * 5);
let expires_at_ns = locked_at_ns.saturating_add(
    i64::try_from(capped_ns).unwrap_or(NS_IN_MIN * 5)
);

Security Concerns

3. No lock expiration enforcement at acquisition time

The code sets expires_at_ns but never validates it during lock operations. Stale locks can block access indefinitely. Consider:

  • Adding cleanup of expired locks before acquisition attempts
  • Or rejecting get_group_lock results where expires_at_ns < now_ns()

Performance Issues

4. Lack of index on expires_at_ns (migrations/2025-11-21-032209_group_locks/up.sql)

If expired lock cleanup is added later, queries filtering by expires_at_ns will be slow without an index:

CREATE INDEX idx_group_locks_expires_at ON group_locks(expires_at_ns);

Test Coverage Gaps

5. Missing edge case tests:

  • Lock expiration scenarios (expired locks should be ignored/cleaned up)
  • Concurrent acquisition attempts from multiple connections
  • release() failure handling
  • GroupGuard with different QueryGroupLock implementations (e.g., wrapped connections)
  • Extremely large expires_in values testing the 5-minute cap

6. WASM test coverage incomplete

The transaction test uses xmtp_common::if_native!, excluding WASM. Given that tests should use #[xmtp_common::test] per CLAUDE.md, verify that the basic acquisition test (lines 212-229) properly runs in both environments.

Code Quality

7. Inconsistent error handling patterns

delete_group_lock returns Result<usize, StorageError> but callers never check if deletion affected 0 rows, which could indicate:

  • Lock already deleted by another process
  • Lock expired and was cleaned up
  • Wrong locked_at_ns timestamp

Consider returning a more specific error when 0 rows are deleted.

8. Missing mutable access to connection

GroupGuard::conn() only provides immutable access, but many DB operations require &mut Q. Consider adding:

pub fn conn_mut(&mut self) -> &mut Q {
    &mut self.conn
}

Summary

The implementation is well-structured with good RAII semantics, but has critical bugs around lock cleanup ordering and potential time arithmetic issues. Address the release_inner ordering issue (already noted by macroscopeapp) and the integer overflow before merging.

Copy link
Contributor Author

tylerhawkes commented Nov 21, 2025


How to use the Graphite Merge Queue

Add the label mergequeue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 83.01887% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.62%. Comparing base (533cc84) to head (c1a59b8).

Files with missing lines Patch % Lines
xmtp_db/src/encrypted_store/group_lock.rs 82.93% 36 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2823      +/-   ##
==========================================
+ Coverage   74.57%   74.62%   +0.04%     
==========================================
  Files         386      387       +1     
  Lines       49874    50086     +212     
==========================================
+ Hits        37193    37375     +182     
- Misses      12681    12711      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tylerhawkes tylerhawkes force-pushed the group-locks branch 2 times, most recently from 2f2c0e2 to 5683492 Compare November 21, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants