-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize heartbeat payload size #1606
Comments
It is not necessary to carry |
RegionFailureHandler may need to carry region stats every heartbeat, If only one out of N can carry region status, region failover may work incorrectly. cc @MichaelScofield |
@Fengys123 It's indeed we need to carry all the status of currently contained regions in datanode in the heartbeat. If we can proof that the heartbeat's payload size is not big even if it carries very large regions stats, we can close this issue. |
Seems the "large data" refer only to strings like [catalog_name, schema_name, table_name]? |
@fengjiachun it might be, but we'd better do some tests to see what the size is |
cc @Fengys123 |
Did some tests and generated some data, as shown below:
|
So the heartbeat's payload is not small enough to be neglected considering we might open a lot of regions on a single datanode. Which part of the heartbeat contributes the most bytes in payload? |
Ok let me continue to explore. |
|
I think we could simply use table id. |
proto does not need to be modified, catalog_name, schema_name, and table_name in table_ident can be left empty. |
If we finish refactoring the datanode, I guess we only need to pass a region id. |
Yes. |
Originally posted by @killme2008 in #1558 (comment)
As the comment above suspects, it's needed to see how the heartbeat payload size is when it contains very many region stats. Optimization should be taken in place to reduced the bandwidth usage (and the cpu time and memory to process them of course).
The text was updated successfully, but these errors were encountered: