-
Notifications
You must be signed in to change notification settings - Fork 674
Description
We use boost beast to read from S3 in high-throughput scenarios: files with hundreds of GBs on machines with 192 cores and 100Gb/s network interfaces to S3 from ec2.
The io-context is limited to ~500k operations per second and saturates at ~8 threads on our machines. More threads will just burn through CPU cycles fighting for locks.
500k operations per second should be plenty to achieve 8+GB/s throughput but it isn't: Every read_some operation only reads a single TLS fragment (max size is 16KB) even when all buffers (socket and user-space buffers) are decently sized. This leads to a limit of 3GB/sec while still beeing inefficient with the io-context busy with tiny read operations.
I tried a hacky boost modification to read more than one TLS fragment per read_some operation which promptly increased throughput by 3x and is close to the theoretical maximum of the machine. All while requiring fewer operations (and CPU utilization) in the io-context.
I don't see any way to control this without modifying boost itself - am I missing something or is this scenario currently not covered?