Skip to content

Comments

fix: share domain type across network components to fix handshake after fork activation#814

Merged
mergify[bot] merged 3 commits intosigp:unstablefrom
diegomrsantos:fix/shared-domain-type-handshake
Feb 10, 2026
Merged

fix: share domain type across network components to fix handshake after fork activation#814
mergify[bot] merged 3 commits intosigp:unstablefrom
diegomrsantos:fix/shared-domain-type-handshake

Conversation

@diegomrsantos
Copy link
Member

Issue Addressed

Closes #811

Proposed Changes

  • Introduce SharedDomainType (Arc<parking_lot::RwLock<DomainType>>) as a single source of truth for the domain type across network components
  • Network creates the shared domain type and passes it to Discovery and handshake::Behaviour via Arc clones
  • On fork activation, Network::on_fork_phase updates the shared value once — all components see the new domain type immediately, eliminating the stale handshake state that caused NetworkMismatch errors
  • Rename NodeInfo.network_id to NodeInfo.domain_type for consistent naming across the codebase
  • Make handshake Behaviour.metadata non-optional since we always provide it
  • Rename Discovery::update_domain_type to update_enr_domain_type to clarify its reduced responsibility

Additional Info

The root cause was that the domain type was independently stored in three places (Network, Discovery, handshake::Behaviour) with no shared state. The handshake copy was never updated on fork activation. Rather than adding yet another update_domain_type call, this PR eliminates the duplication entirely with a shared reference pattern consistent with the existing architecture.

Discovery query predicates now read the current domain type at query time (via the shared Arc), meaning queries started just before a fork activation will use the new domain type — which is the correct behavior.

…er fork activation

After a fork activates, the handshake Behaviour's domain type was stale because it was set once
at startup and never updated. This caused NetworkMismatch errors with peers using the new domain.

Introduce SharedDomainType (Arc<parking_lot::RwLock<DomainType>>) as a single source of truth
owned by Network and shared with Discovery and handshake Behaviour. On fork activation, one
update is visible to all components immediately.

Also renames NodeInfo.network_id to domain_type for consistent naming across the codebase,
and makes handshake metadata non-optional since we always provide it.

Closes sigp#811
@claude-code-actions-sigp
Copy link

claude-code-actions-sigp bot commented Feb 9, 2026

Claude finished @diegomrsantos's task in 2m 38s —— View job


Code Review Complete ✓

Summary

This PR effectively solves the NetworkMismatch error after fork activation by introducing a shared domain type pattern (Arc<RwLock<DomainType>>). The architectural change from duplicated state to shared state is sound and eliminates the root cause.

Verdict: Approve with minor suggestions


Strengths

  1. Architectural Improvement: Shared state pattern eliminates duplication rather than adding more synchronization points
  2. Clean Abstraction: SharedDomainType provides clear get()/set() API hiding synchronization details
  3. Comprehensive Fix: All three components now share the same domain type source of truth
  4. Good Testing: Solid test coverage for handshake success, concurrent dials, and network mismatch
  5. Proper Naming: Renaming network_id to domain_type improves consistency

Issues and Recommendations

1. Thread Safety - Potential Lock Blocking (Medium Priority)

Location: anchor/network/src/network.rs:433 (inline comment added)

The write lock in Network::on_fork_phase() could briefly block if discovery queries are evaluating predicates that hold read locks. While unlikely to cause issues due to rarity of fork activations, worth noting for debugging.

Status: Inline comment added for awareness

2. Missing Test Coverage for Fork Activation (Medium Priority)

Location: anchor/network/src/handshake/mod.rs (inline comment added)

Current tests cover basic scenarios but miss the fork activation flow. Should add integration test verifying:

  • Handshake succeeds with initial domain type
  • Domain type updates on one swarm
  • Handshake fails with NetworkMismatch
  • Both swarms updated → handshake succeeds

Recommendation: Fix this →

3. Documentation Enhancement (Low Priority)

Location: anchor/network/src/lib.rs:24-29 (inline comment added)

SharedDomainType docs should clarify lifecycle, update semantics, and concurrency guarantees.

Status: Inline comment added with suggested documentation

4. Race Condition Clarification (Low Priority)

The PR description states queries started before fork activation will use the new domain type, but this isn't precise. Queries started before fork activation will evaluate predicates with a mix - predicates evaluated after the fork update will see the new domain type.

Impact: Likely acceptable behavior (we want to connect to new fork ASAP), but worth clarifying in documentation.

5. Minor Observations

  • NodeInfo.metadata is still Option<NodeMetadata> despite PR description mentioning it's now always provided (see inline comment)
  • update_enr_domain_type() naming is clear enough, though alternatives like sync_enr_with_domain_type() could be considered
  • parking_lot dependency added correctly

Performance, Security, Code Quality

Performance: Excellent choice of parking_lot::RwLock for high-read, low-write pattern. Lock-free fast paths for uncontended reads.

Security: No vulnerabilities. Domain type validation properly prevents wrong-network connections.

Code Quality: Clean separation of concerns, appropriate Rust idioms, good error handling, comprehensive logging.


Recommendation

Ready to merge. The core implementation is sound, thread-safe, and solves the reported issue. Suggestions above are minor enhancements that don't block merging and can be addressed in follow-up work if desired.

Critical issues: None
Medium priority: Test coverage for fork activation
Low priority: Documentation enhancements

self.domain_type = current.domain_type;
// Update the shared domain type — all components (discovery, handshake) see the
// new value immediately.
self.domain_type.set(current.domain_type);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential Write Lock Blocking: This write lock acquisition could briefly block if discovery queries are evaluating predicates (which hold read locks on the shared domain type). While unlikely to cause issues in practice due to the rarity of fork activations and the brevity of the lock hold time, it's worth noting for debugging purposes if any hangs are reported during fork transitions.

Comment on lines 36 to +38
#[derive(Serialize, Deserialize, Debug, Clone, Default, PartialEq)]
pub struct NodeInfo {
pub network_id: String,
pub domain_type: String,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming Consistency: Good change renaming from network_id to domain_type for consistency.

Minor note: The PR description mentions "Make handshake Behaviour.metadata non-optional since we always provide it", but NodeInfo.metadata is still Option<NodeMetadata>. Consider whether the optionality is still needed, or if this could be simplified to a non-optional field in future work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the Option would break compatibility with any peer that doesn't send metadata, causing handshakes to fail unnecessarily

Add lifecycle, update semantics, and concurrency sections
to clarify how the shared domain type is created, updated
on fork activation, and safely accessed across components.
Verify that SharedDomainType updates propagate to subsequent
handshakes through three phases: initial success with matching
domain, mismatch after one side updates, and success again
after both sides update.
@diegomrsantos diegomrsantos force-pushed the fix/shared-domain-type-handshake branch from 7a22195 to 012a50c Compare February 9, 2026 18:07
return Err(Error::NetworkMismatch {
ours: ours.network_id.clone(),
theirs: theirs.network_id.clone(),
ours: ours.domain_type.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this rename will def increase readability, nice!

};

let handshake = {
let domain_type: String = network_config.domain_type.into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see! so before, we were passing around copies of DomainType derived from initial config. now, the source of truth SharedDomainType is passed around and is updated at the fork epoch, which propagates to discovery and handshake behavior, nice!

@dknopik
Copy link
Member

dknopik commented Feb 10, 2026

Thanks for the fix! Will merge it so we can keep testing for now.

I feel like this can be slimmed down more by just using active_fork_config instead of the separate state in SharedDomainType, but we can always improve this further later.

@mergify mergify bot merged commit a3ba31b into sigp:unstable Feb 10, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants