-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace global hub transport with cloudevents #310
Comments
I see @vMaroon has assigned himself. he was in charge of the initial Kafka work, so he probably has the best answers. @yanmxa continuing the other thread we discussed about performance, this is correct here as well. As I suggested in the different thread we discussed, I suggest here as well to try running high scale simulations in order to understand the huge performance effect a change can make (it's true about any change, not this one specifically). here are the original performance results we were able to achieve: |
As @nirrozenbaum mentioned, regardless of what we say here, any substantial change to the status-path flow should be followed with a set of high scale tests, otherwise it would not be right to claim the same scalability as the system underwent iterations of improvements and compactions to achieve a very efficient and compact data flow.
The above are the tips of each point, if requested, I can dive into reasoning and more details. |
@vMaroon Very much appreciate your information!
100 MCs: 100 RHs with 1000 MCs
1M MCs: 1000 RHs with 1000 MCs
|
Transport Spec PathUpdates:
Transport Status PathUpdates
A/B TestingSynchronize managed clusters from Regional Hubs to Global Hub database. 100 K MCs: 100 RHs with 1000 MCs
1 M MCs: 1000 RHs with 1000 MCs
Improvements
|
Test Results after Increasing Transport Message Limit Size to 940 KB1 M MCs: 1000 RHs with 1000 MCs
Conclusion: From the test results so far, using cloudevents to replace the original transport does not cause significant performance degradation |
@yanmxa these results seem fine on the first look. Did you try testing the load / rotation of 100 policies with the setup above? It's highly suggested to do so. |
@vMaroon Since we focus on the performance change of transport, I only compare and test the cases of 1 M Polices and 1 M managed cluster in HoH initialization.
|
For the transport status path, we added a |
1. Kafka offset committer in Global Hub manager
When the global hub manager receives a message through transport, there is not only a goroutine for the consumer client but also a committer goroutine that commits the Kafka offset periodically. The consumer receives the message and forwards it to the message handler to process the message, after processing the message, the committer updates the offset in Kafka. The process of consuming messages and committing the offset is asynchronous.
But when we use cloudevents to deliver messages, although based on Kafka protocol, we cannot directly update the offset through the Kafka client after receiving the message. Instead, the cloudevents client receives the message and returns an ACK or NACK result to decide whether to update the kafka offset. The message consuming and the offset updating are synchronous in the whole process.
Now the question are:
The text was updated successfully, but these errors were encountered: