-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: 5159, 5169, 5171, 5172, and others #5191
Conversation
…-exceeded conditions
…_Freq` and `walk_seed_probability` options
…ich can happen due to misconfiguration), then still occasionally attempt to walk to non-seed nodes (fixes #5159)
…est last-contact time. Don't query rows in `frontier` directly, unless it's for a specific `slot` (fixes #5169)
…address with its outbound address if both addresses are private. Also, log more walk instantiation data
…eduled message we're about to send, and abort stackerdb sync if they differ (indicates that the p2p network advanced its stackerdb state, and this sync is acting on stale data). Also, log the local peer _and_ contract ID in debug messages
…a kernel-chosen port (avoids clobbering), and move topology tests to integration test CI
…ison function is not a total order (oops)
…nly sortitions in the reward cycle are to confirm the anchor block
…or stackslib when it's a dependency (e.g. so stackslib test code can be used in stacks-node tests)
…ppress `unused` warnings in tests
…stackslib test code in integration tests
@kantai Do you think it is a good use of time to go and patch the thousands of lines of test-only code where there are unused variables? |
Probably not. Run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but I think we'd be better off if the p2p::convergence
tests stayed in net::tests::neighbors
, and we just updated the CI to add another integration test job to execute them there (instead of in stacks-node
).
unless this is needed to merge this PR, can you open an issue and we'll take that off your pile? |
I'd like it in this PR. We needed to have been running these tests the whole time; we would have caught some of these problems earlier if we did. |
👍 |
Thank you everyone for taking this over so I could take on #5193 🙏 |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This fixes a few reported issues in the neighbor walk state machine, as well as a few unreported / privately requested ones:
It fixes [Network] inbound neighbor walk is broken #5159 by making it so that a node whose always-allowed (i.e. seed) peers are unavailable (or misconfigured) can still discover neighbors when not in IBD mode.
It fixes [Network] PeerDB should enforce at most one row per public key hash #5169 by collating queries to the
frontier
table by public key and reporting the one with the latest contact time. The overall schema offrontier
remains the same out of necessity to avoid both DoS and eclipse attacks.It fixes [Network] GetNeighbors reply should alternate between including public-only versus public/private IPs #5171 by classifying rows in the
frontier
table as public (or not) based on whether or not they're in known-private IPv4 or IPv6 address prefixes, and then having theGetNeighbors
handler decide at random as to whether or not to exclude private neighbors in itsNeighbors
response.It fixes [nakamoto] Bug in
/v2/neighbors
#5172 simply by changing themax_age
calculation in the query to find recent neighborsIt adds
[connection_opts].log_neighbor_freq
to allow the node operator to log all p2p conversations toDEBG
-level logs every so often (value is in milliseconds). Default is once every minute.It adds
[connection_opts].walk_seed_probability
to allow the node operator to control the degree to which the node will walk to non-seed nodes (instead of retrying the seed nodes) when not in IBD mode and when no seed nodes are connected. The default is 10%.It fixes a bug in the neighbor storage logic whereby neighbor rows aren't always updated on discovery or handshake, causing the node to rely on stale neighbor data (including stale assessments of the neighbor's in/out degrees, which hampers neighbor walk).
It uses
sort_unstable_by()
instackslib/src/net/prune.rs
to avoid a node crash when two nodes are ranked as equally healthy, which is a problem that has only recently manifested in large-scale testing.It reduces the traffic volume in StackerDB replication by checking a message's
rc_consensus_hash
value against the peer's last-computedrc_consensus_hash
value, so that messages that are locally stale are not propagatedIt fixes a bug in stepping to a neighbor with a private address when the node itself reports private but routable address as its socket's peer address. This caused the node to store the wrong address for the private neighbor (namely, it would store the ephemeral port from the peer address instead of the protocol-reported port).
It updates logging in the StackerDB sync state machine to always report the local peer address and the contract address in question, which makes it easier to debug state machines when the node replicates multiple StackerDBs.
It fixes the neighbor-walk tests so that small-scale tests can now run as unit tests (and are no longer
#[ignore]
d), and moves large-scale topology tests into integration tests (intests::p2p::convergence
) so they can run in parallel without clobbering ports. I don't know how well they'll behave because they need an open file limit of 4096 to run, but it seems Test prlimit #5190 might address this.The majority of line noise from this PR comes from that last test refactoring above. Also, in order to get it to build as an integration test, I had to switch a lot of
#[cfg(test)]
statements to#[cfg(any(test, feature = "testing"))]
. This affected many files.