-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] CppCassandraDriverTest.TestTableBackfillWithLeaderMoves flaky in master #10753
Comments
Seems there's a race on setting the cluster config vs the LB accessing it. cc @sanketkedia
|
Summary: Catalog Manager returns both data blacklist and leader blacklist by reference to the load balancer. If we try to modify the cluster config at the same time then it creates a race and might even lead to the old blacklist getting destroyed before the LB consumes it. Fixed by making a deep copy of the blacklist under the CoW read lock. This diff also gets rid of the mutex lock guarding the shared_ptr to the cluster config. We only modify this pointer on a catalog load so we don't need any guard since both internal and external accesses to the cluster config happen under the scoped leader shared lock and during catalog loading we acquire an exclusive leader lock. Test Plan: /yb_build.sh tsan --clang12 --cxx_test cassandra_cpp_driver-test --gtest-filter CppCassandraDriverTest.TestTableBackfillWithLeaderMoves Reviewers: hsunder, slingam, nicolas, bogdan Reviewed By: bogdan Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D19425
@sanketkedia , Do you plan to fix the failures in the test as part of this GH issue itself or should this be closed a new one filed? It seems like there are failures in the test even after your change, but the failure seems to be different from the tsan race.
|
…in Cluster Config Summary: Original commit: 36a6501 / D19425 Catalog Manager returns both data blacklist and leader blacklist by reference to the load balancer. If we try to modify the cluster config at the same time then it creates a race and might even lead to the old blacklist getting destroyed before the LB consumes it. Fixed by making a deep copy of the blacklist under the CoW read lock. This diff also gets rid of the mutex lock guarding the shared_ptr to the cluster config. We only modify this pointer on a catalog load so we don't need any guard since both internal and external accesses to the cluster config happen under the scoped leader shared lock and during catalog loading we acquire an exclusive leader lock. Test Plan: /yb_build.sh tsan --clang12 --cxx_test cassandra_cpp_driver-test --gtest-filter CppCassandraDriverTest.TestTableBackfillWithLeaderMoves Reviewers: hsunder, slingam, bogdan, nicolas Reviewed By: nicolas Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D21232
Closing this as the TSAN race is fixed. Can file a new one, if still is still flaky, for any index specific issues. |
Jira Link: DB-997
Description
Seen it recently failing in >= 3 build types
https://detective-gcp.dev.yugabyte.com/stability/test?branch=master&class=CppCassandraDriverTest&name=TestTableBackfillWithLeaderMoves
The text was updated successfully, but these errors were encountered: