-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Maintain TaskLockbox at datasource level for higher concurrency #17390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
indexing-service/src/main/java/org/apache/druid/indexing/overlord/GlobalTaskLockbox.java
Fixed
Show fixed
Hide fixed
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request has been marked as stale due to 60 days of inactivity. |
dataSource, | ||
(ds, resource) -> { | ||
if (resource != null && resourcePredicate.test(resource)) { | ||
// TODO: what if a bad runaway operation is holding the TaskLockbox.giant? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a concern on shutdown, right? Since at other times, clear()
will only be called when the refcount is zero, meaning no locks should be held. Just trying to make sure I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a concern only on shutdown()
, which is invoked on loss of leadership.
We would want the leadership change listener to return quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we do, although we also want the old leader to fully stand down before the new one stands up, to avoid races between the leaders. Ideal thing would be to interrupt other in flight operations, if that's feasible. If not feasible right now, then please update this comment to describe the concern and to not be a TODO comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and moved the comment to shutdown()
method as it seems more relevant there.
) throws T | ||
{ | ||
// Verify that sync is complete | ||
if (!syncComplete.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check again in the critical section of getLockboxResource
? This code seems potentially racey, since syncComplete
could be set to false right after this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip!
I didn't initially put the check in getLockboxResource()
because that method is called from syncFromStorage()
as well before the syncComplete
flag has been set.
I guess I can have two variants of the getLockboxResource()
method or just pass a boolean
to decide whether to perform the check or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A parameter about performing the check sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Thanks for the review, @gianm ! |
Description
In clusters with a large number of datasources, task operations block each other owing to the
giant
lock in theTaskLockbox
. This can become particularly problematic in cases of streaming ingestion being done into multiple datasources, where segment allocations become very slow since all of them must pass through a single queue.None of the task operations share a critical section across datasources and it should suffice to perform the locking
at the datasource level.
This patch is an attempt to remediate this problem.
Changes
TaskLockbox
to a single datasource.GlobalTaskLockbox
which delegates to the respective datasource for any operation.GlobalTaskLockbox.syncFromStorage()
must be mutually exclusive from any operationbeing performed on any of the datasources.
syncFromStorage()
is called only fromTaskQueue.start()
on becoming leader, using aReadWriteLock
didn't seem necessarysyncFromStorage()
marks theGlobalTaskLockbox
as "unsynced" at the start of the methodGlobalTaskLockbox
is marked "synced" again and operations can proceed as normalGlobalTaskLockbox.shutdown()
andTaskLockbox.clear()
to clean up unused resources uponloss of leadership.
TaskLockbox
of a datasource is removed when it is not needed anymorePending
Follow up
After this patch, the bottleneck for segment allocation will be the single-threaded
SegmentAllocationQueue
.One approach could be to maintain a separate
SegmentAllocationQueue
for each datasource but that woulddrastically increase the pressure on metadata store in case of multiple datasources.
A better alternative would be to Make segment allocation queue multithreaded
Release note
Improve concurrency on Overlord by ensuring that task actions on one datasource do not block actions on other datasources.
This PR has: