Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(net): Reduce inbound service overloads and add a timeout #6950

Merged
merged 7 commits into from
Jun 15, 2023

Conversation

teor2345
Copy link
Contributor

Motivation

Sometimes Zebra's inbound service gets overloaded and drops a whole lot of connections at the same time.

Close #6911

Complex Code or Requirements

This is concurrent code.

Solution

Timeouts:

  • Add a timeout to inbound service requests, there is currently no timeout on those requests. This is a security issue.
  • Treat inbound timeouts like queue overloads

Overloads:

  • Increase the inbound concurrency limit, to reduce overloads at the cost of slightly more memory
  • Reduce the peer broadcast fraction, to reduce network load
  • Reduce maximum connection drop probability, to reduce the number of disconnections that happen at the same time

Related changes:

  • Document security requirements of inbound peer overload handling

Review

This is an important security fix, it should go in early in the next release.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

Follow Up Work

Work out which requests are slow (or numerous), and make them faster. This PR adds logging for inbound request timeouts.

@teor2345 teor2345 added C-bug Category: This is a bug P-High 🔥 C-security Category: Security issues I-slow Problems with performance or responsiveness A-network Area: Network protocol updates or fixes A-concurrency Area: Async code, needs extra work to make it work properly. labels Jun 14, 2023
@teor2345 teor2345 requested a review from a team as a code owner June 14, 2023 01:38
@teor2345 teor2345 self-assigned this Jun 14, 2023
@teor2345 teor2345 requested a review from a team as a code owner June 14, 2023 01:38
@teor2345 teor2345 requested review from oxarbitrage and removed request for a team June 14, 2023 01:38
@teor2345 teor2345 added the do-not-merge Tells Mergify not to merge this PR label Jun 14, 2023
@teor2345 teor2345 changed the title fix(net): Reduce inbound service overloads fix(net): Reduce inbound service overloads and add a timeout Jun 14, 2023
@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Merging #6950 (79a8afb) into main (329dd71) will decrease coverage by 0.04%.
The diff coverage is 62.06%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6950      +/-   ##
==========================================
- Coverage   77.65%   77.61%   -0.04%     
==========================================
  Files         310      310              
  Lines       41475    41488      +13     
==========================================
- Hits        32207    32202       -5     
- Misses       9268     9286      +18     

@dconnolly dconnolly removed the do-not-merge Tells Mergify not to merge this PR label Jun 14, 2023
Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you.

we're downloading blocks in the wrong order (not strict height order) then hitting a concurrency limit

I suspect you were right about the blocks being out of order, increasing MAX_INBOUND_CONCURRENCY should help.

zebrad/src/components/inbound/downloads.rs Show resolved Hide resolved
zebra-network/src/constants.rs Show resolved Hide resolved
mergify bot added a commit that referenced this pull request Jun 14, 2023
@mergify mergify bot merged commit 32ea511 into main Jun 15, 2023
@mergify mergify bot deleted the inbound-overload-direct branch June 15, 2023 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-network Area: Network protocol updates or fixes C-bug Category: This is a bug C-security Category: Security issues I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid hangs due to slow inbound service requests
3 participants