-
Notifications
You must be signed in to change notification settings - Fork 36
Prevent MLS state forks and republish missing key packages #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a cross-distro Docker installer script, tightens Nostr publish error semantics to fail when all relays reject, changes group operations to publish-before-local-merge, and adjusts key-package welcome handling to unconditionally republish missing key packages with refined error branches. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant GroupMgr as Group Manager
participant Publisher as Publisher
participant Relays
participant LocalState as Local State
rect rgb(220,240,255)
Note over Client,LocalState: Old Flow — immediate local merge
Client->>GroupMgr: request change (add/remove/update)
GroupMgr->>LocalState: Merge pending commit locally
GroupMgr->>Publisher: publish_event()
Publisher->>Relays: send event
Relays-->>Publisher: ack/reject
end
rect rgb(240,255,240)
Note over Client,LocalState: New Flow — publish-then-merge
Client->>GroupMgr: request change (add/remove/update)
GroupMgr->>Publisher: publish_event()
Publisher->>Relays: send event
Relays-->>Publisher: ack results
alt >0 acks
Publisher-->>GroupMgr: Ok (acks)
GroupMgr->>LocalState: Merge pending commit
GroupMgr-->>Client: Success
else 0 acks
Publisher-->>GroupMgr: Err (PublishRejected)
GroupMgr-->>Client: Error
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.rs📄 CodeRabbit inference engine (.cursor/rules/rust-code-style.mdc)
Files:
src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs}📄 CodeRabbit inference engine (.cursor/rules/mls-protocol-docs.mdc)
Files:
🧠 Learnings (5)📚 Learning: 2025-08-11T10:25:55.436ZApplied to files:
📚 Learning: 2025-09-11T13:04:33.349ZApplied to files:
📚 Learning: 2025-09-29T10:45:56.426ZApplied to files:
📚 Learning: 2025-08-23T16:27:41.511ZApplied to files:
📚 Learning: 2025-09-29T10:45:56.426ZApplied to files:
🧬 Code graph analysis (2)src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs (4)
src/whitenoise/groups.rs (1)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (4)
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs (2)
126-147: Make relay probe more robust to avoid false skipsCurrent host/port connect with 200ms timeout is brittle. Prefer fixed socket list (127.0.0.1:8080, 127.0.0.1:7777) and a slightly longer timeout or short retry.
- async fn relays_running() -> bool { - let relay_urls = ["ws://localhost:8080", "ws://localhost:7777"]; - for relay in relay_urls { - if let Ok(url) = RelayUrl::parse(relay) { - let host = url.host(); - let port = url.port(); - let addr = format!("{}:{}", host, port); - if tokio::time::timeout( - Duration::from_millis(200), - tokio::net::TcpStream::connect(&addr), - ) - .await - .is_err() - { - return false; - } - } else { - return false; - } - } - true - } + async fn relays_running() -> bool { + let sockets = ["127.0.0.1:8080", "127.0.0.1:7777"]; + for addr in sockets { + if tokio::time::timeout( + Duration::from_millis(400), + tokio::net::TcpStream::connect(addr), + ) + .await + .is_err() + { + return false; + } + } + true + }
286-288: Avoid fixed short sleep; poll with backoff to reduce flakiness50ms may be insufficient for deletions to propagate. Poll fetch with a brief timeout/backoff instead of blind sleep.
- // Give relays a moment to apply the deletion before we proceed - sleep(Duration::from_millis(50)).await; + // Wait until deletion is observed or timeout + let deadline = std::time::Instant::now() + Duration::from_secs(2); + loop { + if std::time::Instant::now() >= deadline { break; } + if whitenoise + .nostr + .fetch_user_key_package(member_account.pubkey, &relays) + .await + .unwrap() + .is_none() + { + break; + } + sleep(Duration::from_millis(100)).await; + }src/nostr_manager/mod.rs (1)
50-52: PublishRejected variant looks good; consider adding relay context for debuggabilityOptional: include attempted relay URLs (Vec) in the error to aid logs/upstream handling.
src/nostr_manager/publisher.rs (1)
234-257: Unify error semantics for builder-based publishes (metadata/relay list/etc.)If all relays reject, return PublishRejected here too for parity with publish_event_to.
async fn publish_event_builder_with_signer( &self, event_builder: EventBuilder, relays: &[RelayUrl], signer: impl NostrSigner + 'static, ) -> Result<Output<EventId>> { @@ - // Track the published event if we have a successful result (best-effort) - if !result.success.is_empty() { + // If no relay accepted, surface explicit rejection + if result.success.is_empty() { + let event_id = *result.id(); + tracing::warn!( + target: "whitenoise::nostr_manager::publish_event_builder_with_signer", + "Event {} was rejected by all relays: attempted {} relays", + event_id.to_hex(), + relays.len() + ); + return Err(NostrManagerError::PublishRejected { + event_id, + attempted: relays.len(), + }); + } + + // Track the published event (best-effort) + if !result.success.is_empty() { self.event_tracker .track_published_event(result.id(), &pubkey) .await .map_err(|e| NostrManagerError::FailedToTrackPublishedEvent(e.to_string()))?; }Please add a small test mirroring test_publish_event_to_unreachable_relays for metadata (builder path) to assert PublishRejected on unreachable relays.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
get-docker.sh(1 hunks)src/nostr_manager/mod.rs(1 hunks)src/nostr_manager/publisher.rs(3 hunks)src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs(4 hunks)src/whitenoise/groups.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.cursor/rules/rust-code-style.mdc)
**/*.rs: Write all trait bounds in a where clause for functions and impl blocks
Prefer using Self when referring to the implementing type (including return types and enum variant matches)
For public types in libraries, derive Debug, Clone, Copy, PartialEq, Eq, and Hash (in this order) when possible
Derive Default only when a reasonable default value exists
Use full paths for tracing macros (tracing::!(...)) instead of importing macro names
Prefer .to_string() or .to_owned() for &str -> String conversion over .into() or String::from
Order imports: (1) core/alloc/std, (2) external crates, (3) current sub-modules (mod x;), (4) internal crate modules (use crate::, super::, self::)
When implementing fmt traits, import core::fmt and use fmt::Display, fmt::Formatter, etc.
After declaring mod x;, import sub-items via self::x::Y instead of x::Y
Avoid if let ... { } else { } in favor of match when both branches do work
Use if let ... { } when the alternative match arm is intentionally empty
Avoid inline sub-modules (mod x { ... }); prefer a separate file x.rs with mod x; (except tests and benches)
For tests, prefer inline #[cfg(test)] mod tests { ... } and avoid mod tests;
For benches, prefer inline #[cfg(bench)] mod benches { ... } and avoid mod benches;
Files:
src/nostr_manager/mod.rssrc/whitenoise/event_processor/event_handlers/handle_giftwrap.rssrc/whitenoise/groups.rssrc/nostr_manager/publisher.rs
src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs}
📄 CodeRabbit inference engine (.cursor/rules/mls-protocol-docs.mdc)
src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs}: Use the existing rust-nostr MLS crate APIs instead of reimplementing cryptographic or MLS primitives
Preserve forward secrecy in MLS operations (ephemeral keys, proper secret updates)
Ensure post-compromise security through correct epoch transitions and secret updates
Authenticate group membership for all MLS operations (e.g., validating credentials and proposals)
Maintain message ordering and group state consistency across epochs
Implement and verify key rotation procedures as specified by MLS
Files:
src/whitenoise/groups.rs
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
PR: parres-hq/whitenoise#0
File: .cursor/rules/mls-protocol-docs.mdc:0-0
Timestamp: 2025-09-29T10:45:56.426Z
Learning: Applies to src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs} : Preserve forward secrecy in MLS operations (ephemeral keys, proper secret updates)
📚 Learning: 2025-09-11T13:04:33.349Z
Learnt from: erskingardner
PR: parres-hq/whitenoise#354
File: src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs:0-0
Timestamp: 2025-09-11T13:04:33.349Z
Learning: In the whitenoise project, when processing MLS Welcome messages in handle_giftwrap.rs, the code should only publish a replacement key package if the deletion of the used key package was successful (deleted == true). The conditional publishing behavior is intentional and should not be changed to always publish regardless of deletion success.
Applied to files:
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs
📚 Learning: 2025-09-29T10:45:56.426Z
Learnt from: CR
PR: parres-hq/whitenoise#0
File: .cursor/rules/mls-protocol-docs.mdc:0-0
Timestamp: 2025-09-29T10:45:56.426Z
Learning: Applies to src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs} : Implement and verify key rotation procedures as specified by MLS
Applied to files:
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs
📚 Learning: 2025-08-23T16:27:41.511Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#324
File: src/integration_tests/test_cases/subscription_processing/publish_metadata_update.rs:42-45
Timestamp: 2025-08-23T16:27:41.511Z
Learning: In Whitenoise integration tests, using `unwrap()` on `test_client.add_relay()` for dev relay connections is acceptable since these are controlled test environments where failing fast is preferred over complex error handling.
Applied to files:
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs
📚 Learning: 2025-09-29T10:45:56.426Z
Learnt from: CR
PR: parres-hq/whitenoise#0
File: .cursor/rules/mls-protocol-docs.mdc:0-0
Timestamp: 2025-09-29T10:45:56.426Z
Learning: Applies to src/whitenoise/{event_processor/event_handlers/handle_mls_message.rs,key_packages.rs,groups.rs,welcomes.rs} : Maintain message ordering and group state consistency across epochs
Applied to files:
src/whitenoise/groups.rs
📚 Learning: 2025-08-11T10:25:55.436Z
Learnt from: jgmontoya
PR: parres-hq/whitenoise#310
File: src/whitenoise/accounts/mod.rs:669-692
Timestamp: 2025-08-11T10:25:55.436Z
Learning: In the Whitenoise codebase, the `NostrManager::publish_event_builder_with_signer` method automatically ensures relay connectivity by calling `self.add_relays()` before publishing events. This means explicit relay connectivity checks are not needed before calling publishing methods that use this function.
Applied to files:
src/nostr_manager/publisher.rs
🧬 Code graph analysis (2)
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs (4)
src/whitenoise/accounts.rs (1)
relays(98-106)src/whitenoise/database/users.rs (1)
relays(166-198)src/whitenoise/mod.rs (2)
create_mock_whitenoise(646-708)setup_multiple_test_accounts(817-844)src/whitenoise/relays.rs (1)
urls(105-110)
src/whitenoise/groups.rs (1)
src/whitenoise/accounts.rs (2)
create_mdk(263-270)create_mdk(909-911)
🔇 Additional comments (6)
get-docker.sh (4)
174-256: Helper functions are well-implemented.The version comparison, environment detection, and utility functions are defensive, properly quoted, and handle edge cases like CalVer versions and missing values with fallbacks.
258-378: Distribution detection and forked distro mapping are solid.Proper error handling with
set +e/set -e, safe variable usage, and comprehensive codename mapping for Debian versions. The call site ensures variables are initialized before use.
540-579: Debian/Ubuntu version-conditional package selection is correct.Version pinning via
apt-cache madisonis the standard approach, and the conditional package installation based on version gates (18.09 for cli, 20.10 for compose-plugin, etc.) is appropriate. Proper escaping of regex pattern at line 546.
1-87: Note: This file appears orthogonal to the stated PR objectives.The PR description focuses on MLS state fork prevention and key-package handling (
src/whitenoise/groups.rs,handle_giftwrap.rs), but this is a Docker installation utility script. Verify that this file is intentionally included in this PR or if file selection was incorrect. If intentional, document the relationship in the PR description.src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs (1)
72-102: Unconditional republish alters prior contract — confirm policy changeAlways republishing a key package after welcome is a behavioral change from the intentional “only if deletion succeeded” rule. If this supersedes the earlier guidance, please confirm and note in CHANGELOG/migration so API consumers expect potential PublishRejected errors from republish. Based on learnings.
Also applies to: 104-115
src/nostr_manager/publisher.rs (1)
195-209: Correct: return error when all relays reject an already-signed eventThis matches PR intent and prevents silent “success with zero ACKs.”
| DEFAULT_REPO_FILE="docker-ce.repo" | ||
| if [ -z "$REPO_FILE" ]; then | ||
| REPO_FILE="$DEFAULT_REPO_FILE" | ||
| # Automatically default to a staging repo fora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in comment.
Line 112 has a typo: "fora" should be "for a".
- # Automatically default to a staging repo fora
- # a staging download url (download-stage.docker.com)
+ # Automatically default to a staging repo for
+ # a staging download url (download-stage.docker.com)🤖 Prompt for AI Agents
In get-docker.sh around line 112, the comment contains a typo ("fora"); update
the comment to read "for a" so the line becomes a clear English phrase ("#
Automatically default to a staging repo for a"). Ensure spacing and punctuation
remain consistent with surrounding comments.
Warning
Hallucinations may be present, or all encompassing.
Findings
src/whitenoise/groups.rs:322 — Severity: High;
Likelihood: High.
Each add/ remove/update path calls mdk.merge_pending_commit before publish_event_to
confirms relay acceptance. If the publish fails or a competing commit
lands first, the local store advances epochs that the rest of the group never
sees, leading to irrecoverable state forks and message loss until manual
recovery.
src/whitenoise/event_processor/event_handlers/handle_giftwrap.rs:73 —
Severity: Medium; Likelihood: Medium.
The client only republishes a fresh
KeyPackage when it just deleted the old one. In cases where the relay
already dropped the event (expiry, manual cleanup, last-resort reuse) the
branch skips rotation, leaving members without any discoverable
KeyPackage and preventing future invitations..
Problem Statement
Group membership flows were advancing the local MLS epoch immediately after staging a commit, before confirming relay acceptance. If a publish failed or a competing commit landed first, our local state jumped ahead of the rest of the group, creating forks that required manual recovery.
Separately, welcome processing only rotated key packages when it had just deleted the old event; if relays had already dropped the key package, members were left without any discoverable package and could not receive new invites.
Proposed Solution Overview
explicit errors instead of advancing state.
than a no-op.
Tests
Benefits
Tradeoffs
What Happens If Not Implemented
-Groups will continue to fork silently whenever a publish fails mid-flight, leading to message loss until someone manually resets the group.
Summary by CodeRabbit
New Features
Bug Fixes
Tests