-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Master crashes for failed indexes, when backfill responses arrive in interleaved order. #20510
Closed
1 task done
Labels
2.14 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Comments
amitanandaiyer
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Jan 9, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
and removed
status/awaiting-triage
Issue awaiting triage
labels
Jan 9, 2024
amitanandaiyer
added a commit
that referenced
this issue
Jan 10, 2024
…different backfill operations Summary: If the master is slow, or if network delays cause a backfill response to be processed when a different backfill operation is running on the same table, we may run into ``` [m-1] [libprotobuf FATAL /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20231120041920-e307bff3a7-macos-arm64/installed/uninstrumented/include/google/protobuf/map.h:1064] CHECK failed: it != end(): key not found: ebd5a1637b4746f9bf4d6ab7218ffeee ``` It may affect both ysql and ycql backfills. However, it is less likely to be seen on ysql because the create index call in ysql is synchronous. This change looks to check if the reponse being handled and the current backfill job are for the same indexes, if not it bails out gracefully. Jira: DB-9516 Test Plan: ./yb_build.sh --cxx-test integration-tests_cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.ConcurrentBackfillIndexFailures Reviewers: jason, hsunder Reviewed By: jason Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D31571
rthallamko3
changed the title
[DocDB] Handle backfill responses getting interleaved
[DocDB] Master crashes for failed indexes, when backfill responses arrive in interleaved order.
Jan 10, 2024
amitanandaiyer
added a commit
that referenced
this issue
Jan 23, 2024
…rleaved across different backfill operations Summary: Original commit: 0fe1bba / D31571 If the master is slow, or if network delays cause a backfill response to be processed when a different backfill operation is running on the same table, we may run into ``` [m-1] [libprotobuf FATAL /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20231120041920-e307bff3a7-macos-arm64/installed/uninstrumented/include/google/protobuf/map.h:1064] CHECK failed: it != end(): key not found: ebd5a1637b4746f9bf4d6ab7218ffeee ``` It may affect both ysql and ycql backfills. However, it is less likely to be seen on ysql because the create index call in ysql is synchronous. This change looks to check if the reponse being handled and the current backfill job are for the same indexes, if not it bails out gracefully. Jira: DB-9516 Test Plan: ./yb_build.sh --cxx-test integration-tests_cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.ConcurrentBackfillIndexFailures Reviewers: jason, hsunder Reviewed By: jason Subscribers: bogdan, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31598
amitanandaiyer
added a commit
that referenced
this issue
Jan 24, 2024
…rleaved across different backfill operations Summary: Original commit: 0fe1bba / D31571 If the master is slow, or if network delays cause a backfill response to be processed when a different backfill operation is running on the same table, we may run into ``` [m-1] [libprotobuf FATAL /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20231120041920-e307bff3a7-macos-arm64/installed/uninstrumented/include/google/protobuf/map.h:1064] CHECK failed: it != end(): key not found: ebd5a1637b4746f9bf4d6ab7218ffeee ``` It may affect both ysql and ycql backfills. However, it is less likely to be seen on ysql because the create index call in ysql is synchronous. This change looks to check if the reponse being handled and the current backfill job are for the same indexes, if not it bails out gracefully. Jira: DB-9516 Test Plan: ./yb_build.sh --cxx-test integration-tests_cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.ConcurrentBackfillIndexFailures Reviewers: jason, hsunder Reviewed By: jason Subscribers: ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31599
amitanandaiyer
added a commit
that referenced
this issue
Jan 24, 2024
…rleaved across different backfill operations Summary: Original commit: 0fe1bba / D31571 If the master is slow, or if network delays cause a backfill response to be processed when a different backfill operation is running on the same table, we may run into ``` [m-1] [libprotobuf FATAL /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20231120041920-e307bff3a7-macos-arm64/installed/uninstrumented/include/google/protobuf/map.h:1064] CHECK failed: it != end(): key not found: ebd5a1637b4746f9bf4d6ab7218ffeee ``` It may affect both ysql and ycql backfills. However, it is less likely to be seen on ysql because the create index call in ysql is synchronous. This change looks to check if the reponse being handled and the current backfill job are for the same indexes, if not it bails out gracefully. Jira: DB-9516 Test Plan: ./yb_build.sh --cxx-test integration-tests_cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.ConcurrentBackfillIndexFailures Reviewers: jason, hsunder Reviewed By: jason Subscribers: ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31601
yugabyte-ci
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Apr 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
2.14 Backport Required
2.18 Backport Required
2.20 Backport Required
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
priority/high
High Priority
Jira Link: DB-9516
Description
Master crashes for failed indexes, when backfill responses arrive in interleaved order
If the master is slow, or if network delays cause a backfill response to be processed when a different backfill operation is running on the same table, we may run into
Such scenarios should be handled gracefully.
Here is the analysis:
Backfill to Index-1 = 2660a7f255f34d60a76e76dac6d2b978/[xxxxxx] fails for some reason. (the reason is not important). However, it looks like the next backfill job starts immediately/quickly. (Index-2 = xxxxx)
However, some async tasks to the tablet for the first index-backfill are yet to be processed.
When these try to come and mark Index-1 = 2660a7f255f34d60a76e76dac6d2b978/xxxxx as failed, it sees a problem that the desired index is no longer found in the map.
backfill_jobs() has now been updated to track `Index-2 = 412983eff5584d7c807308cc4e15b309/xxxxx when Index 2 started backfilling. Thus Index-1 = 2660a7f255f34d60a76e76dac6d2b978/xxxxx is not found in the map which seems to cause the crash.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: