-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Start indexing throttling only after disk IO unthrottling does not keep up with the merge load #125654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start indexing throttling only after disk IO unthrottling does not keep up with the merge load #125654
Conversation
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
server/src/main/java/org/elasticsearch/index/engine/ThreadPoolMergeScheduler.java
Outdated
Show resolved
Hide resolved
@@ -50,7 +50,7 @@ public class ThreadPoolMergeScheduler extends MergeScheduler implements Elastics | |||
); | |||
private final ShardId shardId; | |||
private final MergeSchedulerConfig config; | |||
private final Logger logger; | |||
protected final Logger logger; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change, in order to let these log messages go to the ThreadPoolMergeScheduler
's logger instead of the InternalEngine
's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I think we can maybe simplify the test?
BiConsumer<Integer, Integer> enableIndexingThrottlingHook, | ||
BiConsumer<Integer, Integer> disableIndexingThrottlingHook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these can be just Runnable
which would remove a bit of code elsewhere too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed these, the TestThreadPoolMergeScheduler
directly exposes "if index throttling" as a boolean.
int excessMerges = randomIntBetween(1, 10); | ||
int mergesToSubmit = maxMergeCount + mergesToRun + excessMerges; | ||
int mergesOutstanding = 0; | ||
boolean expectIndexThrottling = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me that every test run invokes index throttling. I wonder if we are overcomplicating the test a little here and whether we should simply make it have X merges submitted and then verify that index-throttling kicks in (twice, one with the max-io-rate-simulation set to false first, then one where it is set to true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's complicated and it not always tests that index throttling kicks in when IO throttling is at max level.
But, I still think the test is valuable because it constantly asserts the indexing throttling state while merges are submitted, scheduler, and running at the same time (and it's deterministic).
I've added another test, that asserts indexing throttling kicks in when more than max_merge_count
merges are submitted (some are scheduled, but none is run). It's simpler, it always asserts that index throttling is toggled, but it's covering less.
Let me know if this now looks OK with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite I spotted some details that look "concerns" to me, I would prefer other person with more time in the project would review this code again, before blocking the merge process.
Thank you for this quick patch.
@@ -173,6 +173,165 @@ public void testSimpleMergeTaskReEnqueueingBySize() { | |||
} | |||
} | |||
|
|||
public void testIndexingThrottlingWhenSubmittingMerges() { | |||
final int maxThreadCount = randomIntBetween(1, 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing with random values, I would recommend running a parametrized test with different parameters set so that we can have reproducible results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not so much about parameterized testing as it is about just not hard-coding a specific value. It is quite common to do this in our code-base, so I'd prefer to keep this as is.
int submittedMerges = 0; | ||
// merges are submitted, while some are also scheduled (but none is run) | ||
while (submittedMerges < mergesToSubmit - 1) { | ||
isUsingMaxTargetIORate.set(randomBoolean()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above. I think having success with random number in place could give a false sense of healthy state, that could later end up in failure. I would instead suggest having a parametrized test case with all the combination of parameters sets considered to be interesting for testing. I would suggest some reading regarding table driven unit tests [1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok here, since the test ends up verifying that throttling does kick in when true and the loop here is expected to run a few times in most cases.
@@ -493,4 +652,49 @@ private static MergeInfo getNewMergeInfo(long estimatedMergeBytes) { | |||
private static MergeInfo getNewMergeInfo(long estimatedMergeBytes, int maxNumSegments) { | |||
return new MergeInfo(randomNonNegativeInt(), estimatedMergeBytes, randomBoolean(), maxNumSegments); | |||
} | |||
|
|||
static class TestThreadPoolMergeScheduler extends ThreadPoolMergeScheduler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have to override a class to test it (or for it), in this case, probably what you need is a field you can change in the original class itself to trigger a different behaviour. The reason is a refactor in the code could give unexpected side effects not easy to spot by not testing agains the actual implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically try to avoid such artificial test pieces in production code. I think such an override as here is quite common so do not mind the current form. Feel free to maybe open a draft PR after this is merged to illustrate your point more (I am not sure I fully understood it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…ep up with the merge load (elastic#125654) Fixes an issue where indexing throttling kicks in while disk IO is throttling. Instead disk IO should first unthrottle, and only then, if we still can't keep up with the merging load, start throttling indexing. Fixes elastic/elasticsearch-benchmarks#2437 Relates elastic#120869
Fixes an issue where indexing throttling kicks in while disk IO is throttling.
Instead disk IO should first unthrottle, and only then, if we still can't keep up with the merging load, start throttling indexing.
Fixes https://github.com/elastic/elasticsearch-benchmarks/issues/2437
Relates #120869