-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/loadbalancing] Support consistency between scale-out events #33959
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This is a problem inherent to distributed systems: there's only so much we can do before documenting the use-cases we won't support. When designing the load-balancing exporter, the trade-off was: either scaling events aren't frequent and we need fewer sync stages (periodic intervals for resolvers can be longer), or scaling events are frequent and need more/shorter refresh intervals. This still requires no coordination between the nodes, acknowledging that they might end up making a different decision for the same trace ID during the moments where one load balancer is out of sync with the others. Hopefully, this is a short period of time, but this will happen. To alleviate some of the pain, I chose an algorithm that is a bit more expensive than the alternatives, but brought some stability in it: my recollection from the paper was that changes to the circle would affect only about a third of the hashes. I think it was (and still is, for most cases) a good compromise. If we want to make it even better in terms of consistency, I considered a few alternatives in the past, which should be doable as different implementations of the resolver interface. My favorite is to use a distributed key/value store, like etcd or zookeeper, that would allow all nodes to get the same data at the same time, including their updates. Consensus would be handled there, but we'd still need to handle split brain scenarios (which is what I was trying to avoid in the first place). Another thing I considered in the past was to implement a gossip extension and allow load-balancer instances to communicate with each other, and we'd implement the consensus algorithm ourselves (likely raft, using an external library?). Again, split-brain would probably be something we'll have to handle ourselves. If we want to have consistency even on split-brain scenarios... well, then I don't know :-) I'm not ready to think about a solution for that if we don't have that problem yet. Anyway: thank you for opening this issue! I finally got those things out of my head :-) |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Component(s)
exporter/loadbalancing
Is your feature request related to a problem? Please describe.
When a scale-out event occurs, the loadbalancing exporter goes from having
n
endpoints ton+1
endpoints. So now, the exporting is divided differently after the scaling event is complete.Consider the case where a trace has 2 spans, and the load balancing exporter is configured to route by trace ID. Span (a) arrives and is routed a given host. The scaling event occurs and now span (b) arrives, being routed to a different host.
We need a way to effectively scale-out our backend without these inconsistencies, while maintaining performance.
This is a separate problem to a scale-in event which in my opinion presents a different set of problems, and requires the terminating node to flush any data it's statefully holding onto. It may be worth discussing here but I want to focus on the simpler case of a scale-out event.
Describe the solution you'd like
I don't know exactly what the solution should be yet, and I'm hoping that this thread will provide discussion so we can reach a solution that works.
Essentially any solution that I've seen discussed for this problem includes some kind of cache which holds trace ID as the key and the backend as the value. My question to community is: how should this cache be implemented?
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: