-
Notifications
You must be signed in to change notification settings - Fork 36
Normalize relay urls and remove limit(1) from filter #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… the most up-to-date
WalkthroughAdds an idempotent SQL migration to normalize and deduplicate relay URLs and update Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as App
participant RelayLayer as Relay DB Layer
participant DB as Database
rect rgb(238,246,255)
note over App,RelayLayer: Input URL may have trailing slash
App->>RelayLayer: save(url) / find_by_url(url)
RelayLayer->>RelayLayer: normalize_relay_url(url) -- trim trailing '/'
RelayLayer->>DB: INSERT ... ON CONFLICT(normalized_url) / SELECT by normalized_url
DB-->>RelayLayer: canonical relay row
RelayLayer-->>App: Relay { id, url, ... }
end
sequenceDiagram
autonumber
participant Caller as Caller
participant Query as NostrManager::query
participant Relays as Nostr Relays
rect rgb(245,240,255)
note over Caller,Relays: Fetch events (metadata/user_relays/key_package)
Caller->>Query: fetch_*(pubkey)
Query->>Relays: request events (no SQL limit)
Relays-->>Query: Vec<Event>
Query->>Query: filter out events > now + MAX_FUTURE_SKEW
Query->>Query: select max_by_key((created_at, id))
Query-->>Caller: latest event-derived result
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
src/nostr_manager/query.rs (4)
26-33: Use max_by_key to avoid full sort and extra allocationYou only need the latest event. Replace collect+sort with a single pass.
-// Convert to Vec and sort by created_at timestamp descending to get the latest one -let mut events_vec: Vec<Event> = events.into_iter().collect(); -events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at)); - -match events_vec.first() { +let latest = events.into_iter().max_by_key(|e| e.created_at); +match latest { Some(event) => Ok(Some(Metadata::try_from(event)?)), None => Ok(None), }
49-57: Same optimization here: compute latest via max_by_key-// Convert to Vec and sort by created_at timestamp descending to get the latest one -let mut events_vec: Vec<Event> = relay_events.into_iter().collect(); -events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at)); - -tracing::debug!( - "Fetched {} relay events, using latest: {:?}", - events_vec.len(), - events_vec.first().map(|e| e.created_at) -); - -match events_vec.first() { +let latest = relay_events.into_iter().max_by_key(|e| e.created_at); +tracing::debug!( + "Fetched relay events; using latest: {:?}", + latest.as_ref().map(|e| e.created_at) +); +match latest { None => Ok(HashSet::new()), Some(event) => Ok(Self::relay_urls_from_event(event.clone())), }Also applies to: 59-62
76-81: Key package: avoid sort+Vec; pick latest directly-// Convert to Vec and sort by created_at timestamp descending to get the latest one -let mut events_vec: Vec<Event> = events.into_iter().collect(); -events_vec.sort_by(|a, b| b.created_at.cmp(&a.created_at)); - -Ok(events_vec.into_iter().next()) +Ok(events.into_iter().max_by_key(|e| e.created_at))
19-24: Optional: bound data volume without reintroducing stalenessIf event volumes are large, consider adding a moderate
since(...)window or a higherlimit(...)(e.g., 200–1000) while still selecting the max locally, to cap memory and latency. Validate with your busiest users before enabling.Also applies to: 42-47, 70-74
src/whitenoise/database/relays.rs (2)
31-40: Normalization may be too aggressive for non-root pathsTrimming any trailing slash changes
.../nostr/to.../nostr. Some relays could route strictly by path. Consider restricting normalization to the root path only, or feature-gate this behavior.Example tweak (keeps trailing slash when path isn’t just
/):-fn normalize_relay_url(url: &RelayUrl) -> String { - let url_str = url.to_string(); - // Remove trailing slash if present - if url_str.ends_with('/') { - url_str.trim_end_matches('/').to_string() - } else { - url_str - } -} +fn normalize_relay_url(url: &RelayUrl) -> String { + let s = url.to_string(); + if !s.ends_with('/') { + return s; + } + // Only trim when path is exactly "/" + if let Ok(u) = url::Url::parse(&s) { + if u.path() == "/" { + return s.trim_end_matches('/').to_string(); + } + } + s +}Note: add
urlcrate import if not already available or fully qualify as shown.
466-539: Tests validate normalization thoroughly—nice coverageRoot, ports, and path variants are exercised. If you scope normalization to root-only, update the path-related assertions accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
db_migrations/0013_normalize_relay_urls.sql(1 hunks)src/nostr_manager/query.rs(3 hunks)src/whitenoise/database/relays.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.
📚 Learning: 2025-08-18T07:22:47.108Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.
Applied to files:
src/whitenoise/database/relays.rs
📚 Learning: 2025-08-23T16:23:47.066Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#324
File: src/integration_tests/test_cases/subscription_processing/publish_metadata_update.rs:35-39
Timestamp: 2025-08-23T16:23:47.066Z
Learning: In the Whitenoise integration test framework, using `unwrap()` on `RelayUrl::parse()` for the hardcoded dev_relays URLs (ws://localhost:8080 and ws://localhost:7777) is acceptable since these are known, controlled addresses in the test environment where failing fast is preferred.
Applied to files:
src/whitenoise/database/relays.rs
📚 Learning: 2025-08-23T16:27:41.511Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#324
File: src/integration_tests/test_cases/subscription_processing/publish_metadata_update.rs:42-45
Timestamp: 2025-08-23T16:27:41.511Z
Learning: In Whitenoise integration tests, using `unwrap()` on `test_client.add_relay()` for dev relay connections is acceptable since these are controlled test environments where failing fast is preferred over complex error handling.
Applied to files:
src/whitenoise/database/relays.rs
🧬 Code graph analysis (2)
src/nostr_manager/query.rs (4)
src/nostr_manager/mod.rs (1)
new(75-187)src/whitenoise/accounts.rs (4)
new(51-70)nip65_relays(96-100)relays(85-93)relays(488-491)src/whitenoise/relays.rs (1)
new(69-76)src/whitenoise/database/users.rs (1)
relays(166-198)
src/whitenoise/database/relays.rs (2)
src/whitenoise/relays.rs (1)
new(69-76)src/whitenoise/database/mod.rs (1)
new(52-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check (ubuntu-latest, native)
- GitHub Check: check (macos-latest, native)
- GitHub Check: check (macos-14, native)
🔇 Additional comments (4)
db_migrations/0013_normalize_relay_urls.sql (2)
1-6: Assumption check: stripping trailing slashes from paths may not be universally safeTreating
wss://relay.com/nostrandwss://relay.com/nostr/as equivalent might break relays that route strictly by path. Since the code normalizes paths too, please confirm this matches real relay behavior you depend on.Would you like me to gate normalization to root-only paths in both SQL and Rust, or keep current behavior after validating against a known relay set?
29-37: No additional FK remapping required — only user_relays stores relay_idRepo scan (migrations + src) found no FOREIGN KEYs or other tables containing relay_id; 0006_data_migration joins relays by url, not id. Checked: db_migrations/0005_new_tables.sql, db_migrations/0006_data_migration.sql, db_migrations/0013_normalize_relay_urls.sql, src/whitenoise/database/user_relays.rs, src/whitenoise/database/users.rs.
Likely an incorrect or invalid review comment.
src/nostr_manager/query.rs (1)
19-24: Removal of limit(1) aligns with the PR goalFetching all matching events and deferring selection to the client avoids premature termination and stale data.
Also applies to: 42-47, 70-74
src/whitenoise/database/relays.rs (1)
113-121: Good: lookups and writes consistently use the normalized URLBinding the normalized string in both SELECT and INSERT/ON CONFLICT ensures a single canonical row per relay URL.
Also applies to: 159-175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/nostr_manager/query.rs (1)
20-21: Minor API clean-up: accept &[RelayUrl] directly to avoid clonesFinish the TODO to pass relay URLs directly and drop per-call cloning.
Would you like me to generate a follow-up patch updating call sites accordingly?
Also applies to: 37-41
db_migrations/0013_normalize_relay_urls.sql (2)
67-79: Idempotent keeper update is good; minor nit for clarityThis WHERE limits to changed rows; consider documenting in a comment that it’s a no-op if already normalized.
81-82: Make cleanup resilient on re-runsUse IF EXISTS to harden reruns or partial failures.
-DROP TABLE relay_normalization_plan; +DROP TABLE IF EXISTS relay_normalization_plan;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
db_migrations/0013_normalize_relay_urls.sql(1 hunks)src/nostr_manager/query.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-18T07:22:47.108Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#318
File: src/whitenoise/accounts/messages.rs:83-91
Timestamp: 2025-08-18T07:22:47.108Z
Learning: In the Whitenoise codebase, `nostr_mls.get_relays()` returns a `std::collections::BTreeSet<RelayUrl>`, which means relay URLs are already deduplicated and don't need additional deduplication logic.
Applied to files:
src/nostr_manager/query.rs
🧬 Code graph analysis (1)
src/nostr_manager/query.rs (1)
src/nostr_manager/utils.rs (1)
relay_urls_from_event(21-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check (ubuntu-latest, native)
- GitHub Check: check (macos-latest, native)
- GitHub Check: check (macos-14, native)
🔇 Additional comments (5)
db_migrations/0013_normalize_relay_urls.sql (5)
15-41: Solid, conflict-free normalization planUsing a temp plan table with keeper_id fully avoids UNIQUE(url) violations and correctly handles 3+ duplicates. LGTM.
58-66: Delete step looks correct and completeAll non-keeper ids are removed; no “middle” dupes left behind. Nice.
1-14: Confirm transactional executionEnsure the migration runner wraps this whole script in a single transaction; otherwise, add BEGIN/COMMIT to avoid partial state on error.
If not wrapped, I can provide a transaction-wrapped variant.
43-57: No UNIQUE constraint on user_relays — collision risk not applicableCREATE TABLE user_relays in db_migrations/0005_new_tables.sql and the runtime SQL in src/whitenoise/database/user_relays.rs define id PRIMARY KEY and (user_id, relay_id, relay_type) with no UNIQUE(user_id, relay_id[, relay_type]); the UPDATE will not violate a UNIQUE constraint.
db_migrations/0005_new_tables.sql:39-42, src/whitenoise/database/user_relays.rs:66-69Likely an incorrect or invalid review comment.
43-57: No additional DB foreign keys to relays — user_relays onlySearched for FOREIGN KEY / REFERENCES relays and for relay_id columns; only user_relays contains relay_id (db_migrations/0005_new_tables.sql, src/whitenoise/database/user_relays.rs). The UPDATE in db_migrations/0013_normalize_relay_urls.sql (Step 2) covers the needed remapping; no other FK remaps required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/nostr_manager/query.rs (1)
61-64: Log chosen event id as well for traceabilityYou previously got this suggestion; adding id helps correlate selections across relays.
- tracing::debug!( - "Fetched relay events, using latest: {:?}", - latest.as_ref().map(|e| e.created_at) - ); + tracing::debug!( + "Fetched relay events, using latest created_at={:?} id={:?}", + latest.as_ref().map(|e| e.created_at), + latest.as_ref().map(|e| e.id) + );
🧹 Nitpick comments (6)
src/nostr_manager/query.rs (6)
29-35: Prefer comparator over max_by_key to avoid moving EventIdSafer and more general: compare created_at then id without constructing a tuple key.
- let latest = events - .into_iter() - .filter(|e| e.created_at <= cutoff) - .max_by_key(|e| (e.created_at, e.id)); + let latest = events + .into_iter() + .filter(|e| e.created_at <= cutoff) + .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));
54-60: Apply the same comparator pattern here for consistencyMirror the comparator-based tie-break to keep selection logic uniform.
- let latest = relay_events - .into_iter() - .filter(|e| e.created_at <= cutoff) - .max_by_key(|e| (e.created_at, e.id)); + let latest = relay_events + .into_iter() + .filter(|e| e.created_at <= cutoff) + .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));
83-89: Unify latest selection with comparator-based tie-breakerSame rationale as above.
- let latest = events - .into_iter() - .filter(|e| e.created_at <= cutoff) - .max_by_key(|e| (e.created_at, e.id)); + let latest = events + .into_iter() + .filter(|e| e.created_at <= cutoff) + .max_by(|a, b| a.created_at.cmp(&b.created_at).then_with(|| a.id.cmp(&b.id)));
13-15: Make MAX_FUTURE_SKEW configurableConsider reading from config/env or a NostrManager field so operators can tune tolerance without a rebuild. Default can remain 1h.
22-27: Nice: limit(1) removal fixes stale UI; consider adding an optional 'since' boundWithout reintroducing limit, you could set Filter::since(...) from last-seen timestamp (if tracked) to cap payloads on cold starts.
Also applies to: 49-52, 78-81
19-24: Replace &[Relay] params with &[RelayUrl] and remove the map/collectChange fetch_metadata_from(...) and fetch_user_relays(...) to accept &[RelayUrl] and pass that slice directly to client.fetch_events_from(...) (removing the internal nip65_relays.iter().map(|r| r.url.clone()).collect()). Update callers that currently pass &[Relay].
Call sites to update (examples): src/whitenoise/users.rs:40, src/whitenoise/accounts.rs:1360, src/nostr_manager/publisher.rs:307, src/whitenoise/users.rs:308, src/whitenoise/accounts.rs:646, src/whitenoise/accounts.rs:1048, src/whitenoise/accounts.rs:1058.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/nostr_manager/query.rs(4 hunks)src/whitenoise/database/relays.rs(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/whitenoise/database/relays.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/nostr_manager/query.rs (3)
src/whitenoise/accounts.rs (4)
nip65_relays(96-100)new(51-70)relays(85-93)relays(488-491)src/nostr_manager/mod.rs (1)
new(75-187)src/nostr_manager/utils.rs (1)
relay_urls_from_event(21-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: check (macos-14, native)
- GitHub Check: check (macos-latest, native)
- GitHub Check: check (ubuntu-latest, native)
I keep seeing older relay lists and metadata showing up in the UI. While the nostr-sdk does try and order things by created at, it seems that the
limit(1)will bail early out of the stream fetching events as soon as it sees a single event. So we don't limit now and do a quick sort to be absolutely sure we are getting the most recent events.I was also seeing very odd behavior trying to add relays because we weren't normalizing the trailing slash off of the relay urls.
Summary by CodeRabbit
Bug Fixes
Chores
Tests