Maintain TaskLockbox at datasource level for higher concurrency #17390

kfaraz · 2024-10-22T14:47:45Z

Description

In clusters with a large number of datasources, task operations block each other owing to the giant lock in the TaskLockbox. This can become particularly problematic in cases of streaming ingestion being done into multiple datasources, where segment allocations become very slow since all of them must pass through a single queue.

None of the task operations share a critical section across datasources and it should suffice to perform the locking
at the datasource level.

This patch is an attempt to remediate this problem.

Changes

Dedicate TaskLockbox to a single datasource.
Add a GlobalTaskLockbox which delegates to the respective datasource for any operation.
The GlobalTaskLockbox.syncFromStorage() must be mutually exclusive from any operation
being performed on any of the datasources.
- Since syncFromStorage() is called only from TaskQueue.start() on becoming leader, using a ReadWriteLock didn't seem necessary
- Instead, syncFromStorage() marks the GlobalTaskLockbox as "unsynced" at the start of the method
- When "unsynced", all other lockbox operations fail.
- Upon sync completion, the GlobalTaskLockbox is marked "synced" again and operations can proceed as normal
Add GlobalTaskLockbox.shutdown() and TaskLockbox.clear() to clean up unused resources upon
loss of leadership.
The TaskLockbox of a datasource is removed when it is not needed anymore

Pending

Test out the patch in a cluster

Follow up

After this patch, the bottleneck for segment allocation will be the single-threaded SegmentAllocationQueue.
One approach could be to maintain a separate SegmentAllocationQueue for each datasource but that would
drastically increase the pressure on metadata store in case of multiple datasources.
A better alternative would be to Make segment allocation queue multithreaded

Release note

Improve concurrency on Overlord by ensuring that task actions on one datasource do not block actions on other datasources.

This PR has:

indexing-service/src/main/java/org/apache/druid/indexing/overlord/GlobalTaskLockbox.java

github-actions · 2024-12-22T00:23:20Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2025-03-08T00:17:36Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

gianm · 2025-06-10T16:48:10Z

indexing-service/src/main/java/org/apache/druid/indexing/overlord/GlobalTaskLockbox.java

+        dataSource,
+        (ds, resource) -> {
+          if (resource != null && resourcePredicate.test(resource)) {
+            // TODO: what if a bad runaway operation is holding the TaskLockbox.giant?


This is only a concern on shutdown, right? Since at other times, clear() will only be called when the refcount is zero, meaning no locks should be held. Just trying to make sure I understand.

Yes, it's a concern only on shutdown(), which is invoked on loss of leadership.
We would want the leadership change listener to return quickly.

True, we do, although we also want the old leader to fully stand down before the new one stands up, to avoid races between the leaders. Ideal thing would be to interrupt other in flight operations, if that's feasible. If not feasible right now, then please update this comment to describe the concern and to not be a TODO comment.

Updated and moved the comment to shutdown() method as it seems more relevant there.

gianm · 2025-06-10T16:56:11Z

indexing-service/src/main/java/org/apache/druid/indexing/overlord/GlobalTaskLockbox.java

+  ) throws T
+  {
+    // Verify that sync is complete
+    if (!syncComplete.get()) {


Check again in the critical section of getLockboxResource? This code seems potentially racey, since syncComplete could be set to false right after this check.

Thanks for the tip!

I didn't initially put the check in getLockboxResource() because that method is called from syncFromStorage() as well before the syncComplete flag has been set.

I guess I can have two variants of the getLockboxResource() method or just pass a boolean
to decide whether to perform the check or not.

A parameter about performing the check sounds reasonable to me.

kfaraz · 2025-06-11T15:48:10Z

Thanks for the review, @gianm !

Keep TaskLockbox at datasource level for higher concurrency

de9047b

github-actions bot added the Area - Ingestion label Oct 22, 2024

kfaraz changed the title ~~Maintain TaskLockbox at datasource level for higher concurrency~~ [WIP] Maintain TaskLockbox at datasource level for higher concurrency Oct 22, 2024

github-advanced-security bot found potential problems Oct 22, 2024

View reviewed changes

indexing-service/src/main/java/org/apache/druid/indexing/overlord/GlobalTaskLockbox.java Fixed Show fixed Hide fixed

kfaraz added 3 commits October 22, 2024 23:03

Fix tests

5fd4198

Acquire lock when updating list of datasources

4a38391

Checkstyle and stuff

07c5221

github-actions bot added the stale label Dec 22, 2024

kfaraz removed the stale label Jan 6, 2025

maytasm mentioned this pull request Jan 7, 2025

Faster batch segment allocation by reducing metadata IO #17420

Closed

10 tasks

github-actions bot added the stale label Mar 8, 2025

kfaraz removed the stale label Mar 11, 2025

kfaraz added 8 commits March 20, 2025 08:58

Merge branch 'master' of github.com:apache/druid into datasource_lockbox

90c5bbb

wip: temp changes

ed98bb8

Merge branch 'master' of github.com:apache/druid into datasource_lockbox

1a5be5b

Remove extra changes

1206322

Merge branch 'master' of github.com:apache/druid into datasource_lockbox

fae1500

wip: temp changes

60bd777

Merge branch 'master' of github.com:apache/druid into datasource_lockbox

6d00d72

Fix for upstream changes

f2ee06a

kfaraz changed the title ~~[WIP] Maintain TaskLockbox at datasource level for higher concurrency~~ Maintain TaskLockbox at datasource level for higher concurrency Jun 4, 2025

kfaraz added 4 commits June 5, 2025 10:37

Some improvements

a26b06f

Fix up tests

794066a

Remove unused logger

1a4a5fd

Fix up some tests

19c72ca

kfaraz mentioned this pull request Jun 9, 2025

Make SegmentAllocationQueue multithreaded #18098

Merged

10 tasks

Javadocs cleanup

5545904

gianm reviewed Jun 10, 2025

View reviewed changes

Verify sync complete before acquiring lock resource

aa0fafb

gianm approved these changes Jun 11, 2025

View reviewed changes

gianm merged commit 5ff0020 into apache:master Jun 11, 2025
141 of 143 checks passed

kfaraz deleted the datasource_lockbox branch June 11, 2025 15:48

capistrant added this to the 34.0.0 milestone Jul 22, 2025

Maintain TaskLockbox at datasource level for higher concurrency #17390

Maintain TaskLockbox at datasource level for higher concurrency #17390

Uh oh!

Conversation

kfaraz commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Pending

Follow up

Release note

Uh oh!

Uh oh!

github-actions bot commented Dec 22, 2024

Uh oh!

github-actions bot commented Mar 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfaraz commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kfaraz commented Oct 22, 2024 •

edited

Loading