io-context overwhelmed by small 16KB TLS reads limits high-throughput scenarios

We use boost beast to read from S3 in high-throughput scenarios: files with hundreds of GBs on machines with 192 cores and 100Gb/s network interfaces to S3 from ec2.
The io-context is limited to ~500k operations per second and saturates at ~8 threads on our machines. More threads will just burn through CPU cycles fighting for locks.
500k operations per second should be plenty to achieve 8+GB/s throughput but it isn't: Every read_some operation only reads a single TLS fragment (max size is 16KB) even when all buffers (socket and user-space buffers) are decently sized. This leads to a limit of 3GB/sec while still beeing inefficient with the io-context busy with tiny read operations.
I tried a hacky boost modification to read more than one TLS fragment per read_some operation which promptly increased throughput by 3x and is close to the theoretical maximum of the machine. All while requiring fewer operations (and CPU utilization) in the io-context.

I don't see any way to control this without modifying boost itself - am I missing something or is this scenario currently not covered?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

io-context overwhelmed by small 16KB TLS reads limits high-throughput scenarios #3062

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

io-context overwhelmed by small 16KB TLS reads limits high-throughput scenarios #3062

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions