Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-17609:[1/4] Changes needed to convert system tests to use KRaft and remove ZK #17275

Open
wants to merge 9 commits into
base: trunk
Choose a base branch
from

Conversation

bbejeck
Copy link
Contributor

@bbejeck bbejeck commented Sep 25, 2024

This is part one of a multi-pr effort to convert Kafka Streams system tests to KRaft. I decided to break down the changes into multiple PRs to reduce the review load

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added the tests Test fixes (including flaky tests) label Sep 25, 2024
@cluster(num_nodes=6)
@matrix(from_version=smoke_test_versions, bounce_type=["full"])
def test_app_upgrade(self, from_version, bounce_type):
@cluster(num_nodes=9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 9 nodes now?

Copy link
Contributor Author

@bbejeck bbejeck Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran the test locally, I got a 'not enough resources' error indicating that the test needed more than 6 nodes, but I'll set it back to 6 and run on the branch builder as it seems to have been fine before. Probably an oversight on my part.

UPDATE: Running branch builder I received this error

ducktape.cluster.node_container.InsufficientResourcesError: 
Not enough nodes available to allocate. linux nodes requested: 1. linux nodes available: 0

So I'm reverted num_nodes back to 9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still confuse why we need 3 more node now... no need to block the PR on this question, but I am just curious to understand why. -- I mean, if we have 3 brokers, and 3 controllers, plus 3 KS, yes, we would need 9. But before, we should have had 3 brokers, 3 ZK, and 3 KS, and thus also need 9? -- Are we actually running with dedicate quorum nodes? Or did this test not us ZK but "mixed-moded" broker/controller nodes before?

def test_app_upgrade(self, from_version, bounce_type):
@cluster(num_nodes=9)
@matrix(from_version=smoke_test_versions, bounce_type=["full"], metadata_quorum=[quorum.combined_kraft])
def test_app_upgrade(self, from_version, bounce_type, metadata_quorum):
Copy link
Member

@mjsax mjsax Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bounce_type is always full, and metadata_quorum is always quorum.combined_kraft, why do we have both as test parameters? Can we not hard-code both inside the test itself?

Even if I don't even see where metadata_quorum is used? Why do we add it?

Copy link
Contributor Author

@bbejeck bbejeck Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata_quorum must be in the @matrix annotation as it's picked up by the ServiceQuorumInfo.for_test class method to indicate the type of metadata quorum (via the test_context) - if it's not set, the test will assume you're using ZK and throws an error because the zk field in the KafkaService class is None. From the git history, the team made updates some time ago adding the metadata_quorum parameter to streams system tests to use KRaft vs. ZK. You'll see this pattern in all the Kafka Streams system tests.

As for bounce_type, I'm unfamiliar with this test; there is a condition for a "rolling" bounce type in the test, but it was never added to the test parameter list. Do you know anything about this? I can add "rolling" to the parameter list and exercise this portion of the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC (but not 100% sure) full means we stop all nodes first, and restart them afterward, while rolling would be regular rolling bounce, ie, stop-start, stop-start.

Not sure if we did run this test will rolling restart in the past? If yes, maybe there was a reason to remove rolling-restart and only test with full? Might need some digging. -- If it details this PR, we can also do a follow up to figure this out, and either add rolling-restart or maybe remove the parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the history, it looks like the rolling parameter was never enabled; the code is there to support the rolling restart. So I enabled it in this latest run, and there were three failures, all from the rollingportion - it's from the three oldest versions. If debugging these failures holds up the PR too much, I'll removerolling` and defer it to a follow up PR.

@mjsax
Copy link
Member

mjsax commented Sep 26, 2024

Thanks Bill. Made a pass. Can you also trigger a system test run and share the link?

@bbejeck bbejeck force-pushed the KAFKA-17609_convert_system_tests_to_kraft_part_1 branch from b5df169 to ec294cf Compare September 30, 2024 15:05
@bbejeck bbejeck changed the title KAFKA-17609: Changes needed to convert system tests to use KRaft and remove ZK KAFKA-17609:[1/4] Changes needed to convert system tests to use KRaft and remove ZK Sep 30, 2024
@bbejeck
Copy link
Contributor Author

bbejeck commented Sep 30, 2024

Thanks Bill. Made a pass. Can you also trigger a system test run and share the link?

Passing system tests for ApplicationUpgradeTest
Passing system tests for BrokerBounceTest

Also in the PR description

@bbejeck
Copy link
Contributor Author

bbejeck commented Oct 1, 2024

@mjsax all comments addressed

@@ -51,7 +51,7 @@ def run_tests(self):
license="apache2.0",
packages=find_packages(),
include_package_data=True,
install_requires=["ducktape==0.11.4", "requests==2.31.0"],
install_requires=["ducktape==0.8.14", "requests==2.24.0"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is temporary to run the system tests - I'll revert once the test completes

@bbejeck
Copy link
Contributor Author

bbejeck commented Oct 1, 2024

I kicked off a new branch builder for streams_application_upgrade_test and I'll post the link here when it completes.

@bbejeck
Copy link
Contributor Author

bbejeck commented Oct 1, 2024

System test results 29 passed, 3 failed - I'll look into the failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
streams tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants