Optimize heartbeat payload size #1606

MichaelScofield · 2023-05-18T03:38:19Z

If a datanode contains over 10,000 regions, i wonder that if the heartbeat request is too large.

Originally posted by @killme2008 in #1558 (comment)

As the comment above suspects, it's needed to see how the heartbeat payload size is when it contains very many region stats. Optimization should be taken in place to reduced the bandwidth usage (and the cpu time and memory to process them of course).

fengjiachun · 2023-05-18T04:40:41Z

It is not necessary to carry region stats with every heartbeat. Only one out of N can carry region status.

fengys1996 · 2023-07-17T02:41:17Z

Only one out of N can carry region status.

RegionFailureHandler may need to carry region stats every heartbeat,

If only one out of N can carry region status, region failover may work incorrectly. cc @MichaelScofield

MichaelScofield · 2023-07-17T03:02:09Z

@Fengys123 It's indeed we need to carry all the status of currently contained regions in datanode in the heartbeat. If we can proof that the heartbeat's payload size is not big even if it carries very large regions stats, we can close this issue.

fengjiachun · 2023-07-17T05:06:14Z

Seems the "large data" refer only to strings like [catalog_name, schema_name, table_name]?

MichaelScofield · 2023-07-17T05:30:59Z

@fengjiachun it might be, but we'd better do some tests to see what the size is

fengjiachun · 2023-07-17T08:57:27Z

@fengjiachun it might be, but we'd better do some tests to see what the size is

cc @Fengys123

fengys1996 · 2023-07-18T03:06:20Z

Did some tests and generated some data, as shown below:

The size of payload: 6.5429688 KB, Region number: 100, 
The size of payload: 65.42969 KB, Region number: 1000, 
The size of payload: 654.2969 KB, Region number: 10000, 
The size of payload: 6542.9688 KB, Region number: 100000,

Test Code here

MichaelScofield · 2023-07-18T07:07:06Z

So the heartbeat's payload is not small enough to be neglected considering we might open a lot of regions on a single datanode. Which part of the heartbeat contributes the most bytes in payload?

fengys1996 · 2023-07-18T10:19:14Z

Ok let me continue to explore.

fengys1996 · 2023-07-19T02:43:46Z

# Normal
The size of payload: 7.5195313 KB, Region number: 100, 
The size of payload: 75.19531 KB, Region number: 1000, 
The size of payload: 751.9531 KB, Region number: 10000, 
The size of payload: 7519.5313 KB, Region number: 100000, 

# Set `table_name` of `table_ident` to none
The size of payload: 3.0273438 KB, Region number: 100, 
The size of payload: 30.273438 KB, Region number: 1000, 
The size of payload: 302.73438 KB, Region number: 10000, 
The size of payload: 3027.3438 KB, Region number: 100000, 

# Set 'table_ident' to none
The size of payload: 1.953125 KB, Region number: 100, 
The size of payload: 19.53125 KB, Region number: 1000, 
The size of payload: 195.3125 KB, Region number: 10000, 
The size of payload: 1953.125 KB, Region number: 100000, 

Summarize
1. `table_ident` accounts for 75%, of which `table_name` in table_ident accounts for 60%
2. Others accounted for 25%,

MichaelScofield · 2023-07-19T02:45:32Z

I think we could simply use table id.

fengjiachun · 2023-07-19T11:46:54Z

proto does not need to be modified, catalog_name, schema_name, and table_name in table_ident can be left empty.

evenyag · 2023-07-20T07:54:21Z

If we finish refactoring the datanode, I guess we only need to pass a region id.

fengjiachun · 2023-07-24T07:23:23Z

If we finish refactoring the datanode, I guess we only need to pass a region id.

Yes.

MichaelScofield mentioned this issue May 18, 2023

feat: region failover procedure #1558

Merged

2 tasks

evenyag mentioned this issue May 18, 2023

Optimize heartbeat payload size #1605

Closed

MichaelScofield added the Metasrv label Jun 7, 2023

fengys1996 self-assigned this Jul 12, 2023

github-actions bot unassigned fengys1996 Mar 19, 2024

tisonkun added A-metasrv Involves code in the meta server and removed Metasrv labels May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize heartbeat payload size #1606

Optimize heartbeat payload size #1606

MichaelScofield commented May 18, 2023 •

edited

Loading

fengjiachun commented May 18, 2023

fengys1996 commented Jul 17, 2023

MichaelScofield commented Jul 17, 2023

fengjiachun commented Jul 17, 2023

MichaelScofield commented Jul 17, 2023

fengjiachun commented Jul 17, 2023

fengys1996 commented Jul 18, 2023

MichaelScofield commented Jul 18, 2023

fengys1996 commented Jul 18, 2023

fengys1996 commented Jul 19, 2023

MichaelScofield commented Jul 19, 2023

fengjiachun commented Jul 19, 2023

evenyag commented Jul 20, 2023

fengjiachun commented Jul 24, 2023

Optimize heartbeat payload size #1606

Optimize heartbeat payload size #1606

Comments

MichaelScofield commented May 18, 2023 • edited Loading

fengjiachun commented May 18, 2023

fengys1996 commented Jul 17, 2023

MichaelScofield commented Jul 17, 2023

fengjiachun commented Jul 17, 2023

MichaelScofield commented Jul 17, 2023

fengjiachun commented Jul 17, 2023

fengys1996 commented Jul 18, 2023

MichaelScofield commented Jul 18, 2023

fengys1996 commented Jul 18, 2023

fengys1996 commented Jul 19, 2023

MichaelScofield commented Jul 19, 2023

fengjiachun commented Jul 19, 2023

evenyag commented Jul 20, 2023

fengjiachun commented Jul 24, 2023

MichaelScofield commented May 18, 2023 •

edited

Loading