[dst] Improve tail-latency for operations of transactions using wait queues

Jira Link: [DB-3158](https://yugabyte.atlassian.net/browse/DB-3158)

The biggest contributor to higher tail latency is caused by the following case of starvation -- in case there is a high-degree of contention, waiting transactions may get starved by incoming operations which contend for the same latch. We currently have no mechanism to prevent this, which can lead to high tail-latency in some workloads.

Less critically, our process for determining which waiters can be resumed and subsequently resuming them could be improved in a couple ways:
1. We currently iterate over each of the blocker's waiters and separately acquire a write lock on a mutex to remove the waiter from `waiter_status_` before resuming the waiter. We need not re-acquire this write lock for every waiter and can simply acquire it once
2. We currently resume waiters in the order they arrived, and in serial on a single thread. It might be better to understand which of the waiters will conflict with each other, and then either:
    a. Resolve the first-in waiter and all non-conflicting other waiters in parallel
    b. Resolve the largest set of non-conflicting waiters in parallel, then the second largest, etc

[DB-3158]: https://yugabyte.atlassian.net/browse/DB-3158?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dst] Improve tail-latency for operations of transactions using wait queues #13580

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[dst] Improve tail-latency for operations of transactions using wait queues #13580

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions