Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Network] StackerDB decoherence #5193

Closed
jcnelson opened this issue Sep 16, 2024 · 5 comments
Closed

[Network] StackerDB decoherence #5193

jcnelson opened this issue Sep 16, 2024 · 5 comments
Assignees

Comments

@jcnelson
Copy link
Member

For reasons that are not yet clear, Nakamoto testnet and mainnet StackerDB replicas will eventually lose coherence. Writes to one replica do not find their way to others -- neither via push, nor via sync. This needs investigation, and may be partially fixed by #5191.

@jcnelson jcnelson self-assigned this Sep 16, 2024
@diwakergupta
Copy link
Member

I've observed something on a signer node that might be related. Noting here for tracking, happy to move to a new issue if that's more appropriate. Note that I've already sought input and debugging help from @hstove and @jferrant on this.

The setup:

  • stacks-node v2.5.0.0.6, running as a follower, with stacker = true
  • stacks-signer v2.5.0.0.5.2
  • neither services are exposed publicly, but they have full outbound connectivity

I'm running the binaries directly, co-located on the same machine. There's also a dedicated bitcoind. This setup has been running for several months at this point, without any problems.

Symptoms:

  • Jacinta's tool for reporting missing signers includes my signer's address when she ran on 2 separate mainnet nodes
  • Same tool when run against my node correctly reports my signer's address. In fact, the delta was only my signer's address (when compared to one of the nodes above)
  • There are no warnings or errors in either my node or signer logs. Signer correctly logs "Mock signing for burn block ..." on new burn blocks
  • My node's /v2/neighbor reports plenty of nodes with non-empty stackerdb entries. I can include a full output if that helps.

@jcnelson
Copy link
Member Author

I think I know the reason for this now. The network pruner starts removing new connections after 10 outbound peers have been found (this is the default limit). Network subsystems have a way of "pinning" connections so they won't get pruned while they're in use, but there was a bug in the way the pinning system worked which had a very immediate and noticeable impact on StackerDB (especially since a signer or miner would be running a couple dozen replicas). I'll have a patch out soon, once I'm done testing it.

@diwakergupta
Copy link
Member

Based on the draft PR, would a workaround be to increase soft_max_neighbors_per_org -- happy to test that out if that helps.

@wileyj
Copy link
Contributor

wileyj commented Sep 23, 2024

closing since #5197 is merged

@wileyj wileyj closed this as completed Sep 23, 2024
@blockstack-devops
Copy link
Contributor

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Oct 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants