-
Notifications
You must be signed in to change notification settings - Fork 807
Add throttler implementation to SDK #1905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Had a chat w/ Stephen and realized that it's not really possible to use token buckets here since we need a way to clean up old buckets that aren't used anymore, but also make it not possible for people to reset their throttling bucket by just disconnecting and reconnecting. We probably only want to throttle people if they're a validator, we can use the validator handler we're implementing to drop messages before passing them into this throttling handler. |
network/p2p/throttler.go
Outdated
weight := float64(s.period-offset) / float64(s.period) | ||
|
||
if weight*float64(s.previous.hits[nodeID])+float64(s.current.hits[nodeID]) < float64(s.limit) { | ||
s.current.hits[nodeID]++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to think about if this can overflow. I don't think it can since the amount of hits in a window should always be less than limit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is unlikely to overflow an int count given the periods we'll configure. I'd just document that we don't check for overflow here and move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits and requests for clarifications.
Love the reference, makes context super clear
network/p2p/throttler.go
Outdated
// check if the current evaluation period is over | ||
now := s.clock.Time() | ||
if now.After(s.current.start.Add(s.period)) { | ||
// discard the current window if it's too old |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really to have a correct previous window right?
Current will become previous in this if. Only if system has been very quiet for some time, we should reset previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We always want to rotate out our current window to the previous window. If the current window is too old however (i.e we haven't seen any traffic in a while), we need to reset the current window before rotating it to avoid over-throttling the node.
|
||
func TestSlidingWindowThrottlerHandle(t *testing.T) { | ||
period := time.Minute | ||
previousWindowStartTime := time.Time{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could this be time.Now(). I like it slightly changes from test to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just didn't do time.Now()
to avoid the variance in tests. I don't think it matters in this test specifically but I just generally try to avoid using current time/rng/anything that might change from test-to-test in unit tests. I think for something like an integration test it's fine to do though.
} | ||
|
||
// NewSlidingWindowThrottler returns a new instance of SlidingWindowThrottler | ||
func NewSlidingWindowThrottler(period time.Duration, limit int) *SlidingWindowThrottler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return Throttler
interface instead of the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied below
|
||
// SlidingWindowThrottler is an implementation of the sliding window throttling | ||
// algorithm. | ||
type SlidingWindowThrottler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to export this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to. We could return the interface but my personal opinion is that it's good to just return the struct so the caller can choose whether to use it as a struct or as an interface. Returning as an interface gets rid of what the underlying type is to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept interface, return structs
network/p2p/throttler.go
Outdated
offset := now.Sub(s.current.start) | ||
weight := float64(s.period-offset) / float64(s.period) | ||
|
||
if weight*float64(s.previous.hits[nodeID])+float64(s.current.hits[nodeID]) < float64(s.limit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't weight apply to the current period? if so, should we add a test case to cover this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I could have documented this better. The weight is applied to only the previous window. When we calculate how many calls a node issued in an evaluation period, we have to add the current + weighted sum of the previous window. This is because the current window is always ongoing, so if we're 75% through the current window, we know we still have to add up 25% of the previous window. The current window's number is always accurate, but the previous window is always an estimate since we throw a weight at it that assumes uniform distribution of traffic in the window. Cloudflare's blog post shows that this assumption is good enough and it only results in mis-throttling in less than a fraction of a percent of cases.
Co-authored-by: Darioush Jalali <darioush.jalali@avalabs.org> Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
// Handle returns true if the amount of calls received in the last [s.period] | ||
// time is less than [s.limit] | ||
// | ||
// This is calculated by adding the current period's count to a weighted count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would probably be clearer if we phrase this as "estimated count of the previous period" instead of "weighted count".
I can see how weight can make us think of a linear combination of current period and previous period (which we don't have since current period is "weighted by physics"). Stil maybe use estimate instead of weight may help readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct in that it's an estimate, but it's an estimate of the amount of calls made in X% of the previous window (so kind of like a partial estimate?)... so I felt like a weighted sum was more correct than calling it an estimate. When I wrote this up I was unhappy at the wording but wasn't sure how to make it any more clear.
network/p2p/throttler.go
Outdated
} | ||
|
||
offset := now.Sub(s.current.start) | ||
weight := float64(s.period-offset) / float64(s.period) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we specify / enforce that s.period > 0
to avoid division by 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could validate that period
and limit
are both positive ints at creation for sure. I feel like it's something that's pretty much always gonna be a hard-coded constant and look wrong (passing in -time.Second
sure does look strange...) and returning an error is annoying which is why I didn't do it.
For now I'll just document that it should be positive? If you feel strongly I can return an error in the constructor but I felt like maybe it was overly verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Why this should be merged
Implements a sliding window throttler to the SDK: https://blog.cloudflare.com/counting-things-a-lot-of-different-things/
This works by having the caller configure an evaluation period and a maximum amount of calls per evaluation period. A current and previous window of calls are tracked, where each window contains a map of nodes to the amount of calls they made. To determine if a call for a node should be throttled or not, we estimate how many calls they've made in the last window of time.
This is equivalent to the amount of calls they made in the current evaluation period (always accurate), plus a weighted sum of the amount of calls they made in the previous evaluation period (estimation).
How this works
See article. We track two windows, the current and previous evaluation period. We approximate TPS of a request by adding the current window (exact number), plus a weighted sum of the previous evaluation number (estimation).
How this was tested
Added unit test