fix: share domain type across network components to fix handshake after fork activation#814
Conversation
…er fork activation After a fork activates, the handshake Behaviour's domain type was stale because it was set once at startup and never updated. This caused NetworkMismatch errors with peers using the new domain. Introduce SharedDomainType (Arc<parking_lot::RwLock<DomainType>>) as a single source of truth owned by Network and shared with Discovery and handshake Behaviour. On fork activation, one update is visible to all components immediately. Also renames NodeInfo.network_id to domain_type for consistent naming across the codebase, and makes handshake metadata non-optional since we always provide it. Closes sigp#811
|
Claude finished @diegomrsantos's task in 2m 38s —— View job Code Review Complete ✓SummaryThis PR effectively solves the Verdict: Approve with minor suggestions Strengths
Issues and Recommendations1. Thread Safety - Potential Lock Blocking (Medium Priority)Location: The write lock in Status: Inline comment added for awareness 2. Missing Test Coverage for Fork Activation (Medium Priority)Location: Current tests cover basic scenarios but miss the fork activation flow. Should add integration test verifying:
Recommendation: Fix this → 3. Documentation Enhancement (Low Priority)Location:
Status: Inline comment added with suggested documentation 4. Race Condition Clarification (Low Priority)The PR description states queries started before fork activation will use the new domain type, but this isn't precise. Queries started before fork activation will evaluate predicates with a mix - predicates evaluated after the fork update will see the new domain type. Impact: Likely acceptable behavior (we want to connect to new fork ASAP), but worth clarifying in documentation. 5. Minor Observations
Performance, Security, Code QualityPerformance: Excellent choice of Security: No vulnerabilities. Domain type validation properly prevents wrong-network connections. Code Quality: Clean separation of concerns, appropriate Rust idioms, good error handling, comprehensive logging. RecommendationReady to merge. The core implementation is sound, thread-safe, and solves the reported issue. Suggestions above are minor enhancements that don't block merging and can be addressed in follow-up work if desired. Critical issues: None |
| self.domain_type = current.domain_type; | ||
| // Update the shared domain type — all components (discovery, handshake) see the | ||
| // new value immediately. | ||
| self.domain_type.set(current.domain_type); |
There was a problem hiding this comment.
Potential Write Lock Blocking: This write lock acquisition could briefly block if discovery queries are evaluating predicates (which hold read locks on the shared domain type). While unlikely to cause issues in practice due to the rarity of fork activations and the brevity of the lock hold time, it's worth noting for debugging purposes if any hangs are reported during fork transitions.
| #[derive(Serialize, Deserialize, Debug, Clone, Default, PartialEq)] | ||
| pub struct NodeInfo { | ||
| pub network_id: String, | ||
| pub domain_type: String, |
There was a problem hiding this comment.
Naming Consistency: Good change renaming from network_id to domain_type for consistency.
Minor note: The PR description mentions "Make handshake Behaviour.metadata non-optional since we always provide it", but NodeInfo.metadata is still Option<NodeMetadata>. Consider whether the optionality is still needed, or if this could be simplified to a non-optional field in future work.
There was a problem hiding this comment.
removing the Option would break compatibility with any peer that doesn't send metadata, causing handshakes to fail unnecessarily
Add lifecycle, update semantics, and concurrency sections to clarify how the shared domain type is created, updated on fork activation, and safely accessed across components.
Verify that SharedDomainType updates propagate to subsequent handshakes through three phases: initial success with matching domain, mismatch after one side updates, and success again after both sides update.
7a22195 to
012a50c
Compare
| return Err(Error::NetworkMismatch { | ||
| ours: ours.network_id.clone(), | ||
| theirs: theirs.network_id.clone(), | ||
| ours: ours.domain_type.clone(), |
There was a problem hiding this comment.
this rename will def increase readability, nice!
| }; | ||
|
|
||
| let handshake = { | ||
| let domain_type: String = network_config.domain_type.into(); |
There was a problem hiding this comment.
ah, I see! so before, we were passing around copies of DomainType derived from initial config. now, the source of truth SharedDomainType is passed around and is updated at the fork epoch, which propagates to discovery and handshake behavior, nice!
|
Thanks for the fix! Will merge it so we can keep testing for now. I feel like this can be slimmed down more by just using |
Issue Addressed
Closes #811
Proposed Changes
SharedDomainType(Arc<parking_lot::RwLock<DomainType>>) as a single source of truth for the domain type across network componentsNetworkcreates the shared domain type and passes it toDiscoveryandhandshake::BehaviourviaArcclonesNetwork::on_fork_phaseupdates the shared value once — all components see the new domain type immediately, eliminating the stale handshake state that causedNetworkMismatcherrorsNodeInfo.network_idtoNodeInfo.domain_typefor consistent naming across the codebaseBehaviour.metadatanon-optional since we always provide itDiscovery::update_domain_typetoupdate_enr_domain_typeto clarify its reduced responsibilityAdditional Info
The root cause was that the domain type was independently stored in three places (
Network,Discovery,handshake::Behaviour) with no shared state. The handshake copy was never updated on fork activation. Rather than adding yet anotherupdate_domain_typecall, this PR eliminates the duplication entirely with a shared reference pattern consistent with the existing architecture.Discovery query predicates now read the current domain type at query time (via the shared
Arc), meaning queries started just before a fork activation will use the new domain type — which is the correct behavior.