Skip to content

Conversation

@erskingardner
Copy link
Member

@erskingardner erskingardner commented Sep 12, 2025

I keep seeing older relay lists and metadata showing up in the UI. While the nostr-sdk does try and order things by created at, it seems that the limit(1) will bail early out of the stream fetching events as soon as it sees a single event. So we don't limit now and do a quick sort to be absolutely sure we are getting the most recent events.

I was also seeing very odd behavior trying to add relays because we weren't normalizing the trailing slash off of the relay urls.

Summary by CodeRabbit

  • Bug Fixes

    • Uses the latest profile metadata, relay lists, and key packages (selects most recent events and ignores events timestamped far in the future).
    • Eliminates duplicate relays caused by trailing slashes, preventing duplicate entries and lookup issues.
  • Chores

    • Adds a migration to normalize relay URLs and consolidate existing duplicates.
    • Standardizes relay URL handling during save/find to ensure consistent storage and lookups.
  • Tests

    • Adds tests verifying relay URL normalization, deduplication, and retrieval across URL variants.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 12, 2025

Walkthrough

Adds an idempotent SQL migration to normalize and deduplicate relay URLs and update user_relays; adds URL normalization in Relay DB operations with tests; changes Nostr query logic to fetch all matching events and select the latest by timestamp (with future-skew filtering).

Changes

Cohort / File(s) Summary of changes
DB migration: normalize and deduplicate relays
db_migrations/0013_normalize_relay_urls.sql
New migration that normalizes relay URLs (trim trailing slashes), builds a temporary relay_normalization_plan mapping original→normalized and keeper (min id), updates user_relays to reference keepers, deletes duplicate relays rows, updates keeper rows to normalized URLs, and drops the temp table; includes idempotent comments and ordering to avoid FK violations.
Nostr query: select latest event with future-skew guard
src/nostr_manager/query.rs
Removed limit(1) to fetch all matching events, added MAX_FUTURE_SKEW filtering (ignore events > now + 1h), and replaced first()/first_owned() with max_by_key((created_at, id)) to pick the latest event for metadata, user_relays, and key-package queries; adjusted debug logging.
Relay URL normalization in DB access + tests
src/whitenoise/database/relays.rs
Added private normalize_relay_url (trim trailing slashes); Relay::find_by_url and Relay::save use normalized URL for lookups and upserts to deduplicate canonical URLs; added tests covering normalization, deduplication, and timestamp/update behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as App
  participant RelayLayer as Relay DB Layer
  participant DB as Database

  rect rgb(238,246,255)
  note over App,RelayLayer: Input URL may have trailing slash
  App->>RelayLayer: save(url) / find_by_url(url)
  RelayLayer->>RelayLayer: normalize_relay_url(url)  -- trim trailing '/'
  RelayLayer->>DB: INSERT ... ON CONFLICT(normalized_url) / SELECT by normalized_url
  DB-->>RelayLayer: canonical relay row
  RelayLayer-->>App: Relay { id, url, ... }
  end
Loading
sequenceDiagram
  autonumber
  participant Caller as Caller
  participant Query as NostrManager::query
  participant Relays as Nostr Relays

  rect rgb(245,240,255)
  note over Caller,Relays: Fetch events (metadata/user_relays/key_package)
  Caller->>Query: fetch_*(pubkey)
  Query->>Relays: request events (no SQL limit)
  Relays-->>Query: Vec<Event>
  Query->>Query: filter out events > now + MAX_FUTURE_SKEW
  Query->>Query: select max_by_key((created_at, id))
  Query-->>Caller: latest event-derived result
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately describes the PR's two primary changes—normalizing relay URLs and removing the use of limit(1) from the event filter—and matches the changes shown in the file and migration summaries, so it is clear, specific, and relevant to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Poem

I trim each slash with gentle paw,
I pick one keeper, fix the law.
I chase new events by newest time,
Merge duplicates, make rows rhyme.
A hopping rabbit patches code — hooray, neat DB, neat chode! 🐇✨

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch normalize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
src/nostr_manager/query.rs (4)

26-33: Use max_by_key to avoid full sort and extra allocation

You only need the latest event. Replace collect+sort with a single pass.

-// Convert to Vec and sort by created_at timestamp descending to get the latest one
-let mut events_vec: Vec<Event> = events.into_iter().collect();
-events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at));
-
-match events_vec.first() {
+let latest = events.into_iter().max_by_key(|e| e.created_at);
+match latest {
     Some(event) => Ok(Some(Metadata::try_from(event)?)),
     None => Ok(None),
 }

49-57: Same optimization here: compute latest via max_by_key

-// Convert to Vec and sort by created_at timestamp descending to get the latest one
-let mut events_vec: Vec<Event> = relay_events.into_iter().collect();
-events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at));
-
-tracing::debug!(
-    "Fetched {} relay events, using latest: {:?}",
-    events_vec.len(),
-    events_vec.first().map(|e| e.created_at)
-);
-
-match events_vec.first() {
+let latest = relay_events.into_iter().max_by_key(|e| e.created_at);
+tracing::debug!(
+    "Fetched relay events; using latest: {:?}",
+    latest.as_ref().map(|e| e.created_at)
+);
+match latest {
     None => Ok(HashSet::new()),
     Some(event) => Ok(Self::relay_urls_from_event(event.clone())),
 }

Also applies to: 59-62


76-81: Key package: avoid sort+Vec; pick latest directly

-// Convert to Vec and sort by created_at timestamp descending to get the latest one
-let mut events_vec: Vec<Event> = events.into_iter().collect();
-events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at));
-
-Ok(events_vec.into_iter().next())
+Ok(events.into_iter().max_by_key(|e| e.created_at))

19-24: Optional: bound data volume without reintroducing staleness

If event volumes are large, consider adding a moderate since(...) window or a higher limit(...) (e.g., 200–1000) while still selecting the max locally, to cap memory and latency. Validate with your busiest users before enabling.

Also applies to: 42-47, 70-74

src/whitenoise/database/relays.rs (2)

31-40: Normalization may be too aggressive for non-root paths

Trimming any trailing slash changes .../nostr/ to .../nostr. Some relays could route strictly by path. Consider restricting normalization to the root path only, or feature-gate this behavior.

Example tweak (keeps trailing slash when path isn’t just /):

-fn normalize_relay_url(url: &RelayUrl) -> String {
-    let url_str = url.to_string();
-    // Remove trailing slash if present
-    if url_str.ends_with('/') {
-        url_str.trim_end_matches('/').to_string()
-    } else {
-        url_str
-    }
-}
+fn normalize_relay_url(url: &RelayUrl) -> String {
+    let s = url.to_string();
+    if !s.ends_with('/') {
+        return s;
+    }
+    // Only trim when path is exactly "/"
+    if let Ok(u) = url::Url::parse(&s) {
+        if u.path() == "/" {
+            return s.trim_end_matches('/').to_string();
+        }
+    }
+    s
+}

Note: add url crate import if not already available or fully qualify as shown.


466-539: Tests validate normalization thoroughly—nice coverage

Root, ports, and path variants are exercised. If you scope normalization to root-only, update the path-related assertions accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8b26fa and 444bf2e.

📒 Files selected for processing (3)
  • db_migrations/0013_normalize_relay_urls.sql (1 hunks)
  • src/nostr_manager/query.rs (3 hunks)
  • src/whitenoise/database/relays.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.
📚 Learning: 2025-08-18T07:22:47.108Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.

Applied to files:

  • src/whitenoise/database/relays.rs
📚 Learning: 2025-08-23T16:23:47.066Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#324
File: src/integration_tests/test_cases/subscription_processing/publish_metadata_update.rs:35-39
Timestamp: 2025-08-23T16:23:47.066Z
Learning: In the Whitenoise integration test framework, using `unwrap()` on `RelayUrl::parse()` for the hardcoded dev_relays URLs (ws://localhost:8080 and ws://localhost:7777) is acceptable since these are known, controlled addresses in the test environment where failing fast is preferred.

Applied to files:

  • src/whitenoise/database/relays.rs
📚 Learning: 2025-08-23T16:27:41.511Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#324
File: src/integration_tests/test_cases/subscription_processing/publish_metadata_update.rs:42-45
Timestamp: 2025-08-23T16:27:41.511Z
Learning: In Whitenoise integration tests, using `unwrap()` on `test_client.add_relay()` for dev relay connections is acceptable since these are controlled test environments where failing fast is preferred over complex error handling.

Applied to files:

  • src/whitenoise/database/relays.rs
🧬 Code graph analysis (2)
src/nostr_manager/query.rs (4)
src/nostr_manager/mod.rs (1)
  • new (75-187)
src/whitenoise/accounts.rs (4)
  • new (51-70)
  • nip65_relays (96-100)
  • relays (85-93)
  • relays (488-491)
src/whitenoise/relays.rs (1)
  • new (69-76)
src/whitenoise/database/users.rs (1)
  • relays (166-198)
src/whitenoise/database/relays.rs (2)
src/whitenoise/relays.rs (1)
  • new (69-76)
src/whitenoise/database/mod.rs (1)
  • new (52-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check (ubuntu-latest, native)
  • GitHub Check: check (macos-latest, native)
  • GitHub Check: check (macos-14, native)
🔇 Additional comments (4)
db_migrations/0013_normalize_relay_urls.sql (2)

1-6: Assumption check: stripping trailing slashes from paths may not be universally safe

Treating wss://relay.com/nostr and wss://relay.com/nostr/ as equivalent might break relays that route strictly by path. Since the code normalizes paths too, please confirm this matches real relay behavior you depend on.

Would you like me to gate normalization to root-only paths in both SQL and Rust, or keep current behavior after validating against a known relay set?


29-37: No additional FK remapping required — only user_relays stores relay_id

Repo scan (migrations + src) found no FOREIGN KEYs or other tables containing relay_id; 0006_data_migration joins relays by url, not id. Checked: db_migrations/0005_new_tables.sql, db_migrations/0006_data_migration.sql, db_migrations/0013_normalize_relay_urls.sql, src/whitenoise/database/user_relays.rs, src/whitenoise/database/users.rs.

Likely an incorrect or invalid review comment.

src/nostr_manager/query.rs (1)

19-24: Removal of limit(1) aligns with the PR goal

Fetching all matching events and deferring selection to the client avoids premature termination and stale data.

Also applies to: 42-47, 70-74

src/whitenoise/database/relays.rs (1)

113-121: Good: lookups and writes consistently use the normalized URL

Binding the normalized string in both SELECT and INSERT/ON CONFLICT ensures a single canonical row per relay URL.

Also applies to: 159-175

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/nostr_manager/query.rs (1)

20-21: Minor API clean-up: accept &[RelayUrl] directly to avoid clones

Finish the TODO to pass relay URLs directly and drop per-call cloning.

Would you like me to generate a follow-up patch updating call sites accordingly?

Also applies to: 37-41

db_migrations/0013_normalize_relay_urls.sql (2)

67-79: Idempotent keeper update is good; minor nit for clarity

This WHERE limits to changed rows; consider documenting in a comment that it’s a no-op if already normalized.


81-82: Make cleanup resilient on re-runs

Use IF EXISTS to harden reruns or partial failures.

-DROP TABLE relay_normalization_plan;
+DROP TABLE IF EXISTS relay_normalization_plan;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 444bf2e and cc34b46.

📒 Files selected for processing (2)
  • db_migrations/0013_normalize_relay_urls.sql (1 hunks)
  • src/nostr_manager/query.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T07:22:47.108Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.

Applied to files:

  • src/nostr_manager/query.rs
🧬 Code graph analysis (1)
src/nostr_manager/query.rs (1)
src/nostr_manager/utils.rs (1)
  • relay_urls_from_event (21-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check (ubuntu-latest, native)
  • GitHub Check: check (macos-latest, native)
  • GitHub Check: check (macos-14, native)
🔇 Additional comments (5)
db_migrations/0013_normalize_relay_urls.sql (5)

15-41: Solid, conflict-free normalization plan

Using a temp plan table with keeper_id fully avoids UNIQUE(url) violations and correctly handles 3+ duplicates. LGTM.


58-66: Delete step looks correct and complete

All non-keeper ids are removed; no “middle” dupes left behind. Nice.


1-14: Confirm transactional execution

Ensure the migration runner wraps this whole script in a single transaction; otherwise, add BEGIN/COMMIT to avoid partial state on error.

If not wrapped, I can provide a transaction-wrapped variant.


43-57: No UNIQUE constraint on user_relays — collision risk not applicable

CREATE TABLE user_relays in db_migrations/0005_new_tables.sql and the runtime SQL in src/whitenoise/database/user_relays.rs define id PRIMARY KEY and (user_id, relay_id, relay_type) with no UNIQUE(user_id, relay_id[, relay_type]); the UPDATE will not violate a UNIQUE constraint.
db_migrations/0005_new_tables.sql:39-42, src/whitenoise/database/user_relays.rs:66-69

Likely an incorrect or invalid review comment.


43-57: No additional DB foreign keys to relays — user_relays only

Searched for FOREIGN KEY / REFERENCES relays and for relay_id columns; only user_relays contains relay_id (db_migrations/0005_new_tables.sql, src/whitenoise/database/user_relays.rs). The UPDATE in db_migrations/0013_normalize_relay_urls.sql (Step 2) covers the needed remapping; no other FK remaps required.

@erskingardner erskingardner merged commit 4cd84ee into master Sep 12, 2025
3 of 4 checks passed
@erskingardner erskingardner deleted the normalize branch September 12, 2025 14:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/nostr_manager/query.rs (1)

61-64: Log chosen event id as well for traceability

You previously got this suggestion; adding id helps correlate selections across relays.

-        tracing::debug!(
-            "Fetched relay events, using latest: {:?}",
-            latest.as_ref().map(|e| e.created_at)
-        );
+        tracing::debug!(
+            "Fetched relay events, using latest created_at={:?} id={:?}",
+            latest.as_ref().map(|e| e.created_at),
+            latest.as_ref().map(|e| e.id)
+        );
🧹 Nitpick comments (6)
src/nostr_manager/query.rs (6)

29-35: Prefer comparator over max_by_key to avoid moving EventId

Safer and more general: compare created_at then id without constructing a tuple key.

-        let latest = events
-            .into_iter()
-            .filter(|e| e.created_at <= cutoff)
-            .max_by_key(|e| (e.created_at, e.id));
+        let latest = events
+            .into_iter()
+            .filter(|e| e.created_at <= cutoff)
+            .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));

54-60: Apply the same comparator pattern here for consistency

Mirror the comparator-based tie-break to keep selection logic uniform.

-        let latest = relay_events
-            .into_iter()
-            .filter(|e| e.created_at <= cutoff)
-            .max_by_key(|e| (e.created_at, e.id));
+        let latest = relay_events
+            .into_iter()
+            .filter(|e| e.created_at <= cutoff)
+            .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));

83-89: Unify latest selection with comparator-based tie-breaker

Same rationale as above.

-        let latest = events
-            .into_iter()
-            .filter(|e| e.created_at <= cutoff)
-            .max_by_key(|e| (e.created_at, e.id));
+        let latest = events
+            .into_iter()
+            .filter(|e| e.created_at <= cutoff)
+            .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));

13-15: Make MAX_FUTURE_SKEW configurable

Consider reading from config/env or a NostrManager field so operators can tune tolerance without a rebuild. Default can remain 1h.


22-27: Nice: limit(1) removal fixes stale UI; consider adding an optional 'since' bound

Without reintroducing limit, you could set Filter::since(...) from last-seen timestamp (if tracked) to cap payloads on cold starts.

Also applies to: 49-52, 78-81


19-24: Replace &[Relay] params with &[RelayUrl] and remove the map/collect

Change fetch_metadata_from(...) and fetch_user_relays(...) to accept &[RelayUrl] and pass that slice directly to client.fetch_events_from(...) (removing the internal nip65_relays.iter().map(|r| r.url.clone()).collect()). Update callers that currently pass &[Relay].

Call sites to update (examples): src/whitenoise/users.rs:40, src/whitenoise/accounts.rs:1360, src/nostr_manager/publisher.rs:307, src/whitenoise/users.rs:308, src/whitenoise/accounts.rs:646, src/whitenoise/accounts.rs:1048, src/whitenoise/accounts.rs:1058.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc34b46 and c961032.

📒 Files selected for processing (2)
  • src/nostr_manager/query.rs (4 hunks)
  • src/whitenoise/database/relays.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/whitenoise/database/relays.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/nostr_manager/query.rs (3)
src/whitenoise/accounts.rs (4)
  • nip65_relays (96-100)
  • new (51-70)
  • relays (85-93)
  • relays (488-491)
src/nostr_manager/mod.rs (1)
  • new (75-187)
src/nostr_manager/utils.rs (1)
  • relay_urls_from_event (21-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: check (macos-14, native)
  • GitHub Check: check (macos-latest, native)
  • GitHub Check: check (ubuntu-latest, native)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants