Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under 80% network utilization for LAN networks with large messages and high concurrency #1781

Open
MakMukhi opened this issue Jan 4, 2018 · 4 comments
Labels
P3 Type: Performance Performance improvements (CPU, network, memory, etc)

Comments

@MakMukhi
Copy link
Contributor

MakMukhi commented Jan 4, 2018

go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=1000 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu

@dfawley
Copy link
Member

dfawley commented Jan 5, 2018

It finishes for me, you just have to be patient:

$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=200 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
Stream-traceMode_false-latency_2ms-kbps_102400-MTU_1500-maxConcurrentCalls_200-reqSize_1048576B-respSize_1048576B-Compressor_false: 
50_Latency: 40.3235 s 	90_Latency: 41.0967 s 	99_Latency: 41.1228 s 	Avg latency: 36.3483 s 	Count: 200 	13201654 Bytes/op	16305 Allocs/op	
Histogram (unit: s)
Count: 200  Min:  14.5  *****Max:  41.1*****  Avg: 36.35
------------------------------------------------------------
[         14.456700,          14.456700)    1    0.5%    0.5%  
[         14.456700,          14.456700)    0    0.0%    0.5%  
[         14.456700,          14.456700)    0    0.0%    0.5%  
[         14.456700,          14.456703)    0    0.0%    0.5%  
[         14.456703,          14.456743)    0    0.0%    0.5%  
[         14.456743,          14.457320)    0    0.0%    0.5%  
[         14.457320,          14.465627)    0    0.0%    0.5%  
[         14.465627,          14.585274)    0    0.0%    0.5%  
[         14.585274,          16.308545)    3    1.5%    2.0%  
[         16.308545,          41.128673)  195   97.5%   99.5%  ##########
[         41.128673,                inf)    1    0.5%  100.0%  

(Note the "Max" above, asterisked.)

1MB = 8Mb. 200 streams * 2 directions * 8Mb per message = 3200Mb. LAN mode allows 100Mbps. This means our best case scenario* would be 32s.

I'm not sure why we're 10s worse than optimal at this point (78% of max), but I don't see anything broken with the -networkMode flag. It's just that -benchTime=<short> is not really compatible with high levels of concurrency and large messages relative to the effective throughput.

    • This is if we have perfect fairness across the outgoing client streams and all 200 requests complete at the same moment, with all 200 responses starting afterwards. Interestingly, the overall benchmark would improve if we sent the streams serially so that we could maximize the bi-directional utilization of the network.

@dfawley dfawley added Type: Performance Performance improvements (CPU, network, memory, etc) and removed Type: Testing labels Jan 5, 2018
@dfawley dfawley assigned MakMukhi and unassigned dfawley Jan 5, 2018
@dfawley dfawley changed the title Benchmark test with networkMode=LAN/WAN and large messages doesn't finish. Under 80% network utilization for LAN networks with large messages and high concurrency Jan 5, 2018
@dfawley dfawley added the P2 label Jan 5, 2018
@MakMukhi MakMukhi removed their assignment Jun 13, 2018
@stale stale bot added the stale label Sep 6, 2019
@dfawley dfawley removed the stale label Sep 6, 2019
@dfawley dfawley added P3 and removed P2 labels May 3, 2021
@grpc grpc deleted a comment from stale bot May 3, 2021
@arvindbr8
Copy link
Member

arvindbr8 commented Sep 27, 2023

Run on M2 macbook @master (1.59.0-dev).. So the results below are not comparable.


Ran it again to see what the diff is since the last comment

$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=100
0 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
go1.19.1/grpc1.59.0-dev
streaming-networkMode_LAN-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_2ms-kbps_102400-MTU_150
0-maxConcurrentCalls_1000-reqSize_1048576B-respSize_1048576B-compressor_off-channelz_false-preloader_false-client
ReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-c
onnections_1-recvBufferPool_nil-sharedWriteBuffer_false: 
50_Latency: 159.9492s   90_Latency: 160.0854s   99_Latency: 160.2036s   Avg_Latency: 158.2792s  Bytes/op: 8.57799
1e+06   Allocs/op: 10296.902
Histogram (unit: s)
Count: 1000  Min:  73.3  Max: 160.2  Avg: 158.28
------------------------------------------------------------
[          73.277045,           73.277045)     1    0.1%    0.1%  
[          73.277045,           73.277045)     0    0.0%    0.1%  
[          73.277045,           73.277045)     0    0.0%    0.1%  
[          73.277045,           73.277049)     0    0.0%    0.1%  
[          73.277049,           73.277118)     0    0.0%    0.1%  
[          73.277118,           73.278240)     0    0.0%    0.1%  
[          73.278240,           73.296671)     0    0.0%    0.1%  
[          73.296671,           73.599379)     0    0.0%    0.1%  
[          73.599379,           78.570988)     6    0.6%    0.7%  
[          78.570988,          160.223455)   992   99.2%   99.9%  ##########
[         160.223455,         1501.263413)     1    0.1%  100.0%  
Number of requests:  1000       Request throughput:  8.388608e+08 bit/s
Number of responses: 1000       Response throughput: 8.388608e+08 bit/s

@easwars
Copy link
Contributor

easwars commented Aug 30, 2024

I don't understand if we are saying our benchmarks are broken in some way and need fixing or that our implementation is suboptimal. @dfawley @arvindbr8

@dfawley
Copy link
Member

dfawley commented Aug 30, 2024

I think part of the point of this issue is to figure that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

No branches or pull requests

4 participants