Skip to content

[dst] Improve tail-latency for operations of transactions using wait queues #13580

@robertsami

Description

@robertsami

Jira Link: DB-3158

The biggest contributor to higher tail latency is caused by the following case of starvation -- in case there is a high-degree of contention, waiting transactions may get starved by incoming operations which contend for the same latch. We currently have no mechanism to prevent this, which can lead to high tail-latency in some workloads.

Less critically, our process for determining which waiters can be resumed and subsequently resuming them could be improved in a couple ways:

  1. We currently iterate over each of the blocker's waiters and separately acquire a write lock on a mutex to remove the waiter from waiter_status_ before resuming the waiter. We need not re-acquire this write lock for every waiter and can simply acquire it once
  2. We currently resume waiters in the order they arrived, and in serial on a single thread. It might be better to understand which of the waiters will conflict with each other, and then either:
    a. Resolve the first-in waiter and all non-conflicting other waiters in parallel
    b. Resolve the largest set of non-conflicting waiters in parallel, then the second largest, etc

Metadata

Metadata

Assignees

Labels

area/docdbYugabyteDB core featureskind/enhancementThis is an enhancement of an existing featurepriority/mediumMedium priority issue

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions