Skip to content

Try to improve or remove IOQueue for sends in Kestrel #41391

Open
@kouvel

Description

@kouvel

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

IOQueue for sends seems to be working well currently, but there are likely several issues where its behavior and performance depends heavily on the whole system, perhaps including the amount of load. Some issues that I've seen to be problematic:

  1. What happens when the IOQueue work item for sends runs a bit sooner? The thread processing it depletes the few work items queued to the IOQueue too soon, and more of the same IOQueue work items end up getting scheduled, using more CPU cycles that could be better spent elsewhere.
  2. What happens when the IOQueue work item for sends runs a bit later? Responses are not sent quickly enough, and so some connections don't send new requests quickly enough, delayed new requests also hurts maintaining some steady-state fast paths.
  3. What if the IOQueue work items were instead scheduled to the thread pool with preferLocal: true? In pipelined cases like plaintext, it is processed later than currently, leading to the issue in (2). In non-pipelined cases it seems to be ok.
  4. What if the IOQueue work items were processed inline? In pipelined cases like plaintext, it ends up doing a lot more sends to the socket less efficiently, leading to issue (1). It appears to be the same reason why UnsafePreferInlineScheduling=true regresses throughput quite a bit on plaintext. In non-pipelined cases it seems to be ok.
  5. What if the IOQueue work items were instead scheduled to the thread pool with preferLocal: false? Part of the benefit achieved by IOQueue appears to be to process multiple work items before completing. So while the initial IOQueue work item for sends queued to the thread pool may be delayed a bit (it's queued behind other new requests), once it starts it continues to process other send work items for the group of sockets it's associated with (at a relatively higher priority). Queueing those directly to the thread pool delays each send similarly to the first IOQueue work item queued to the thread pool, which is not an ideal ordering, leading to issue (2). Ordering the work items better often leads to issue (1).

There may be an inherent tradeoff involved. It appears that the way in which the current IOQueue works is highly dependent on other unrelated parts of the system (like the thread pool).

Describe the solution you'd like

May need further investigation. There may be a different strategy that balances these issues better without relying too much on how work items are processed. As there may be inherent tradeoffs, it may also come down to determining what a reasonable tradeoff would be.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Perfarea-networkingIncludes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractionsfeature-kestrel

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions