Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky Iceberg parquet metadata cache test #24415

Merged

Conversation

nmahadevuni
Copy link
Member

Description

Fixes flaky test described in #22422

Motivation and Context

It seems jmx metrics may get mixed up if we are using the query runner and multiple workers. The jmx metrics are bound to coordinator cache object only and owing to the very tiny data read in the test query, if the coordinator doesn't process any split, it will just report zero metrics. So, making the number of nodes to only 1, so this problem is eliminated.

Impact

No impact

Test Plan

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 22, 2025
@prestodb-ci prestodb-ci requested review from a team, aaneja and infvg and removed request for a team January 22, 2025 06:50
@nmahadevuni
Copy link
Member Author

@hantangwangd can you please review this?

@tdcmeehan tdcmeehan merged commit 3534361 into prestodb:master Jan 23, 2025
52 checks passed
@hantangwangd
Copy link
Member

@nmahadevuni Sorry for the late response. I took some time to figure out why coordinator and workers in the query runner will impact each other's jmx metrics (it seems that one will override the other). The core reason is that, all nodes in the query runner have the same MBeanServer instance and then registered their own parquet metadata cacheStatsMBean into this singleton MBeanServer.

Set the number of nodes to only 1 can definitely fix this flaky test. But if we want to completely solve the problem, we need to inject separate MBeanServers for different coordinators and workers, which may involve a small refactoring of IcebergPlugin and IcebergConnectorFactory. So, do you think it's necessary to do this refactoring to thoroughly address the mutual influence of jmx metrics between different nodes in a query runner? @tdcmeehan

shangm2 pushed a commit to shangm2/presto that referenced this pull request Jan 23, 2025
@nmahadevuni
Copy link
Member Author

Thanks @hantangwangd. Yes I think that helps us get the real world setup in the query runner too. Will help a lot with metrics integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants