Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize heartbeat payload size #1606

Open
MichaelScofield opened this issue May 18, 2023 · 14 comments
Open

Optimize heartbeat payload size #1606

MichaelScofield opened this issue May 18, 2023 · 14 comments
Labels
A-metasrv Involves code in the meta server

Comments

@MichaelScofield
Copy link
Collaborator

MichaelScofield commented May 18, 2023

If a datanode contains over 10,000 regions, i wonder that if the heartbeat request is too large.

Originally posted by @killme2008 in #1558 (comment)

As the comment above suspects, it's needed to see how the heartbeat payload size is when it contains very many region stats. Optimization should be taken in place to reduced the bandwidth usage (and the cpu time and memory to process them of course).

@fengjiachun
Copy link
Collaborator

It is not necessary to carry region stats with every heartbeat. Only one out of N can carry region status.

@fengys1996 fengys1996 self-assigned this Jul 12, 2023
@fengys1996
Copy link
Contributor

Only one out of N can carry region status.

RegionFailureHandler may need to carry region stats every heartbeat,

If only one out of N can carry region status, region failover may work incorrectly. cc @MichaelScofield

@MichaelScofield
Copy link
Collaborator Author

@Fengys123 It's indeed we need to carry all the status of currently contained regions in datanode in the heartbeat. If we can proof that the heartbeat's payload size is not big even if it carries very large regions stats, we can close this issue.

@fengjiachun
Copy link
Collaborator

Seems the "large data" refer only to strings like [catalog_name, schema_name, table_name]?

@MichaelScofield
Copy link
Collaborator Author

@fengjiachun it might be, but we'd better do some tests to see what the size is

@fengjiachun
Copy link
Collaborator

@fengjiachun it might be, but we'd better do some tests to see what the size is

cc @Fengys123

@fengys1996
Copy link
Contributor

Did some tests and generated some data, as shown below:

The size of payload: 6.5429688 KB, Region number: 100, 
The size of payload: 65.42969 KB, Region number: 1000, 
The size of payload: 654.2969 KB, Region number: 10000, 
The size of payload: 6542.9688 KB, Region number: 100000, 

Test Code here

@MichaelScofield
Copy link
Collaborator Author

So the heartbeat's payload is not small enough to be neglected considering we might open a lot of regions on a single datanode. Which part of the heartbeat contributes the most bytes in payload?

@fengys1996
Copy link
Contributor

Ok let me continue to explore.

@fengys1996
Copy link
Contributor

# Normal
The size of payload: 7.5195313 KB, Region number: 100, 
The size of payload: 75.19531 KB, Region number: 1000, 
The size of payload: 751.9531 KB, Region number: 10000, 
The size of payload: 7519.5313 KB, Region number: 100000, 

# Set `table_name` of `table_ident` to none
The size of payload: 3.0273438 KB, Region number: 100, 
The size of payload: 30.273438 KB, Region number: 1000, 
The size of payload: 302.73438 KB, Region number: 10000, 
The size of payload: 3027.3438 KB, Region number: 100000, 

# Set 'table_ident' to none
The size of payload: 1.953125 KB, Region number: 100, 
The size of payload: 19.53125 KB, Region number: 1000, 
The size of payload: 195.3125 KB, Region number: 10000, 
The size of payload: 1953.125 KB, Region number: 100000, 

Summarize
1. `table_ident` accounts for 75%, of which `table_name` in table_ident accounts for 60%
2. Others accounted for 25%,

@MichaelScofield
Copy link
Collaborator Author

I think we could simply use table id.

@fengjiachun
Copy link
Collaborator

proto does not need to be modified, catalog_name, schema_name, and table_name in table_ident can be left empty.

@evenyag
Copy link
Contributor

evenyag commented Jul 20, 2023

If we finish refactoring the datanode, I guess we only need to pass a region id.

@fengjiachun
Copy link
Collaborator

If we finish refactoring the datanode, I guess we only need to pass a region id.

Yes.

@tisonkun tisonkun added A-metasrv Involves code in the meta server and removed Metasrv labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-metasrv Involves code in the meta server
Projects
None yet
Development

No branches or pull requests

5 participants