Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18844: Fix flaky QuorumControllerTest.testBalancePartitionLeaders() #18997

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

FrankYang0529
Copy link
Member

When we initialize FeatureControlManager in QuorumController [0], we don't set bootstrapMetadata for it, so it use Metadata.latestProduction() [1]. In the test case, we use IBP_3_7_IV0 as bootstrapMetadata [2]. When initializing QuorumController, it takes time to replay record to overwrite FeatureControlManager#metadataVersion. If we add Thread.sleep(1000L) before metadataVersion.set(mv) [3], we can reproduce the flaky test steadily.

I'm not sure whether setting bootstrapMetadata to FeatureControlManager is correct, so I fix the case by waiting metadataVersion is overwrote.

[0]

this.featureControl = new FeatureControlManager.Builder().
setLogContext(logContext).
setQuorumFeatures(quorumFeatures).
setSnapshotRegistry(snapshotRegistry).
setClusterFeatureSupportDescriber(clusterSupportDescriber).
build();

[1]

private MetadataVersion metadataVersion = MetadataVersion.latestProduction();

[2]

static final BootstrapMetadata SIMPLE_BOOTSTRAP = BootstrapMetadata.
fromVersion(MetadataVersion.IBP_3_7_IV0, "test-provided bootstrap");

[3]

public void replay(FeatureLevelRecord record) {
VersionRange range = quorumFeatures.localSupportedFeature(record.name());
if (!range.contains(record.featureLevel())) {
throw new RuntimeException("Tried to apply FeatureLevelRecord " + record + ", but this controller only " +
"supports versions " + range);
}
if (record.name().equals(MetadataVersion.FEATURE_NAME)) {
MetadataVersion mv = MetadataVersion.fromFeatureLevel(record.featureLevel());
metadataVersion.set(mv);

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added tests Test fixes (including flaky tests) kraft small Small PRs labels Feb 21, 2025
@ijuma
Copy link
Member

ijuma commented Feb 21, 2025

This is related to #18845, thanks for the quick fix!

@ijuma
Copy link
Member

ijuma commented Feb 21, 2025

in QuorumController [0], we don't set bootstrapMetadata for it, so it use Metadata.latestProduction()

We tried to remove these defaults as part of #18845 - looks like I missed this one. Can you try removing the default? Is that an easy change or does it cause a lot of issues?

@FrankYang0529
Copy link
Member Author

FrankYang0529 commented Feb 21, 2025

Can you try removing the default?

I think we can't remove the default one. Each variable get default value if it's null. Or do you mean that trying to set bootstrapMetadata when initialize FeatureControlManager?

if (logContext == null) logContext = new LogContext();
if (snapshotRegistry == null) snapshotRegistry = new SnapshotRegistry(logContext);
if (quorumFeatures == null) {
Map<String, VersionRange> localSupportedFeatures = new HashMap<>();
localSupportedFeatures.put(MetadataVersion.FEATURE_NAME, VersionRange.of(
MetadataVersion.MINIMUM_VERSION.featureLevel(),
MetadataVersion.latestProduction().featureLevel()));
quorumFeatures = new QuorumFeatures(0, localSupportedFeatures, Collections.singletonList(0));
}
return new FeatureControlManager(
logContext,
quorumFeatures,
snapshotRegistry,
metadataVersion,
clusterSupportDescriber
);

@ijuma
Copy link
Member

ijuma commented Feb 21, 2025

Yes, if it's mandatory, then we should set it in the constructor. Any issues doing that?

Copy link
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankYang0529 : Thanks for the PR.

Regarding FeatureControlManager.metadataVersion, it's bootstrapped from the bootstrap file or the log. Any test that depends on it should wait for the bootstrap to complete. So, it does seem reasonable to initialize it to null. @cmccabe : What do you think?

@@ -753,6 +753,10 @@ public void testBalancePartitionLeaders() throws Throwable {
QuorumController active = controlEnv.activeController();
Map<Integer, Long> brokerEpochs = new HashMap<>();

TestUtils.waitForCondition(() ->
active.featureControl().finalizedFeatures(Long.MAX_VALUE).featureMap().get(MetadataVersion.FEATURE_NAME) == SIMPLE_BOOTSTRAP.metadataVersion().featureLevel(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to wait for QuorumController to fully bootstrap before doing the remaining test, it seems that we should just call controlEnv.activeController(true) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I think we can remove QuorumControllerTestEnv#activeController(boolean) function, because it's better for all cases get a real active controller. Other test cases like testUncleanShutdownBrokerElrEnabled also has similar error.

Screenshot 2025-02-22 at 10 37 39 AM

…rs()

Signed-off-by: PoAn Yang <payang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kraft small Small PRs tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants