Skip to content

added a config parameter to limit outbound p2p connections.#2974

Merged
pompon0 merged 18 commits intomainfrom
gprusak-outbound
Feb 25, 2026
Merged

added a config parameter to limit outbound p2p connections.#2974
pompon0 merged 18 commits intomainfrom
gprusak-outbound

Conversation

@pompon0
Copy link
Contributor

@pompon0 pompon0 commented Feb 24, 2026

For a healthy p2p network nodes should provide more capacity than they consume. This PR adds a limit on the number of outbound connections, so that a node does not try to fill up ALL of the available connection slots that it has, but rather to establish just a reasonable number of connections and let other nodes connect to it. Additionally:

  • logic for managing a pool of connections has been extracted from peerManagerInner to a separate type (we have regular and persistent pools)
  • wired up the HandshakeTimeout, DialTimeout and a couple other P2P config fields which were not properly copied to routeroptions.

@github-actions
Copy link

github-actions bot commented Feb 24, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 25, 2026, 6:01 PM

Comment on lines +111 to +114
for old := range pa.fails {
failedAddr = utils.Some(old)
break
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
return
}
// Record the failure time.
now := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
Comment on lines +150 to +155
for old, pa := range p.addrs {
if len(pa.fails) < len(pa.addrs) {
continue
}
return old, true
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +167 to +177
for id, peerAddrs := range p.addrs {
if _, ok := p.dialing[id]; ok {
continue
}
if _, ok := p.conns[id]; ok {
continue
}
if x, ok := bestPeer.Get(); !ok || before(peerAddrs.lastFail, x.lastFail) {
bestPeer = utils.Some(peerAddrs)
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +181 to +185
for addr := range peer.addrs {
if x, ok := best.Get(); !ok || before(getOpt(peer.fails, addr), getOpt(peer.fails, x)) {
best = utils.Some(addr)
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 92.41071% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.45%. Comparing base (82fb610) to head (ed899f5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/p2p/peermanager.go 74.13% 13 Missing and 2 partials ⚠️
sei-tendermint/internal/p2p/peermanager_pool.go 98.31% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2974       +/-   ##
===========================================
+ Coverage   58.11%   68.45%   +10.33%     
===========================================
  Files        2109       40     -2069     
  Lines      173394     3959   -169435     
===========================================
- Hits       100771     2710    -98061     
+ Misses      63675     1007    -62668     
+ Partials     8948      242     -8706     
Flag Coverage Δ
sei-chain 68.30% <92.41%> (+10.22%) ⬆️
sei-db 69.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/config/config.go 70.22% <100.00%> (+0.06%) ⬆️
sei-tendermint/internal/p2p/routeroptions.go 86.36% <100.00%> (-0.31%) ⬇️
sei-tendermint/libs/utils/im/im.go 69.04% <100.00%> (+8.44%) ⬆️
sei-tendermint/libs/utils/math.go 100.00% <100.00%> (ø)
sei-tendermint/libs/utils/proto.go 26.19% <ø> (-6.42%) ⬇️
sei-tendermint/libs/utils/testonly.go 71.69% <100.00%> (+1.10%) ⬆️
sei-tendermint/node/setup.go 67.39% <100.00%> (+2.15%) ⬆️
sei-tendermint/internal/p2p/peermanager_pool.go 98.31% <98.31%> (ø)
sei-tendermint/internal/p2p/peermanager.go 79.67% <74.13%> (-11.68%) ⬇️

... and 2075 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@wen-coding wen-coding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable. However, if there is semantics change, I'd slightly prefer to split semantics change into another PR if possible. Just in case there is any emergency, it would be much easier to diagnose problems.

MaxConnections uint16 `mapstructure:"max-connections"`

// MaxConnections limits the number of outbound peers.
// It should be significantly lower than MaxConnections, unless
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should give a guideline how much lower this should be compared to MaxConnections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I don't know that yet :). I estimate 10 outbound, 50 total, but this is tbd.

func (i *peerManagerInner[C]) TryStartDial(persistentPeer bool) (NodeAddress, bool) {
// Check concurrent dials limit.
if len(i.dialing) >= i.options.maxDials() {
if len(i.regular.dialing)+len(i.persistent.dialing) >= i.options.maxDials() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cleanly split everything (including maxdials) between the two pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, but that's more configuration. We can reconsider that later.

if !i.isPersistent(id) {
delete(i.addrs, id)
}
if !i.isUnconditional[id] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally even when id is persistent we will close connection if it's not unconditional, we don't need this any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all persistent peers are unconditional (see the old newPeerManager impl). In this pr I've merged the unconditional and persistent to be the same concept (see pr description) to simplify the situation a bit.

return Map[K, V]{m.m.Set(key, value)}
}

func (m Map[K, V]) SetOpt(key K, mvalue utils.Option[V]) Map[K, V] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SetOpt sounds like it will do nothing if the key doesn't exist, how about SetOrDelete or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is dual to getopt. I've added documentation. I can change to SetOrDelete if you feel strongly about this.

}

type poolConfig struct {
selfID types.NodeID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would you use different selfID on different pools?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I want the pools to be self contained.

isPersistent: map[types.NodeID]bool{},
conns: utils.NewAtomicSend(im.NewMap[types.NodeID, C]()),

persistent: newPool[C](poolConfig{selfID: selfID}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the metrics? Should we split metrics by pool as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a metric for the total number of connections, there is a log printed every 10s with number of connections (regular vs persistent) and there is network state exposed via net_info rpc endpoint. Metrics and the rpc endpoint need a revamp indeed but it is out of scope of this pr.

@pompon0 pompon0 requested a review from wen-coding February 25, 2026 16:44
@pompon0 pompon0 enabled auto-merge (squash) February 25, 2026 17:52
package utils

import (
"unsafe"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
@pompon0 pompon0 merged commit 89daf14 into main Feb 25, 2026
38 checks passed
@pompon0 pompon0 deleted the gprusak-outbound branch February 25, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants