Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Is your feature request related to a problem? Please describe the problem.
IOQueue
for sends seems to be working well currently, but there are likely several issues where its behavior and performance depends heavily on the whole system, perhaps including the amount of load. Some issues that I've seen to be problematic:
- What happens when the
IOQueue
work item for sends runs a bit sooner? The thread processing it depletes the few work items queued to theIOQueue
too soon, and more of the sameIOQueue
work items end up getting scheduled, using more CPU cycles that could be better spent elsewhere. - What happens when the
IOQueue
work item for sends runs a bit later? Responses are not sent quickly enough, and so some connections don't send new requests quickly enough, delayed new requests also hurts maintaining some steady-state fast paths. - What if the
IOQueue
work items were instead scheduled to the thread pool withpreferLocal: true
? In pipelined cases like plaintext, it is processed later than currently, leading to the issue in (2). In non-pipelined cases it seems to be ok. - What if the
IOQueue
work items were processed inline? In pipelined cases like plaintext, it ends up doing a lot more sends to the socket less efficiently, leading to issue (1). It appears to be the same reason whyUnsafePreferInlineScheduling=true
regresses throughput quite a bit on plaintext. In non-pipelined cases it seems to be ok. - What if the
IOQueue
work items were instead scheduled to the thread pool withpreferLocal: false
? Part of the benefit achieved byIOQueue
appears to be to process multiple work items before completing. So while the initialIOQueue
work item for sends queued to the thread pool may be delayed a bit (it's queued behind other new requests), once it starts it continues to process other send work items for the group of sockets it's associated with (at a relatively higher priority). Queueing those directly to the thread pool delays each send similarly to the firstIOQueue
work item queued to the thread pool, which is not an ideal ordering, leading to issue (2). Ordering the work items better often leads to issue (1).
There may be an inherent tradeoff involved. It appears that the way in which the current IOQueue
works is highly dependent on other unrelated parts of the system (like the thread pool).
Describe the solution you'd like
May need further investigation. There may be a different strategy that balances these issues better without relying too much on how work items are processed. As there may be inherent tradeoffs, it may also come down to determining what a reasonable tradeoff would be.
Additional context
No response