-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB][YCQL][Indexes] Drop YCQL Index times out when create and drop index are being executed continuously while loading the data #20041
Comments
|
…fill Summary: Our stress tests commonly delete the index even before the index is backfilled. The error message logged is confusing and not very helpful. Simplifying the returned error code/message. Previously: ``` E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } } . . . . columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}] }} ``` After the change: ``` [m-1] W1213 17:37:46.736081 1898082304 backfill_index.cc:1582] TS 0127865cf9154dd8aace4611aaefe502: backfill failed for tablet 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]) no further retry: Invalid argument (yb/tserver/tablet_ service.cc:716): Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 response was error { code: OPERATION_NOT_SUPPORTED status { code: INVALID_ARGUMENT message: "Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19" source_file: "../../src/yb/tserver/tablet_service.cc" source_line: 716 errors: "\000" } } failed_index_ids: "95f1a0ca84eb4bd9b922f9edc41d5525" [m-1] I1213 17:37:46.736519 1898082304 backfill_index.cc:1331] Failed to backfill the tablet 0x00000001248da800 -> 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]): Invalid argument (yb/tserver/tablet_service.cc:716): Index 95f1a0 ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 ``` Additionally, prior to this revision we don't seem to be populating `failed_indexes` with the id of the missing index. This causes the whole batch of indexes to be marked as "failed". We ensure that failed_indexes is populated correctly, so that only the index which is not found in the IndexMap is marked as failed and the remaining indexes backfill to success. Jira: DB-9124, DB-9003 Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.DeleteIndexWhileBackfilling Reviewers: rthallam, jason, arybochkin Reviewed By: jason Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D30865
…ring index backfill Summary: Original commit: 0c7dd15 / D30865 Our stress tests commonly delete the index even before the index is backfilled. The error message logged is confusing and not very helpful. Simplifying the returned error code/message. Previously: ``` E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } } . . . . columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}] }} ``` After the change: ``` [m-1] W1213 17:37:46.736081 1898082304 backfill_index.cc:1582] TS 0127865cf9154dd8aace4611aaefe502: backfill failed for tablet 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]) no further retry: Invalid argument (yb/tserver/tablet_ service.cc:716): Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 response was error { code: OPERATION_NOT_SUPPORTED status { code: INVALID_ARGUMENT message: "Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19" source_file: "../../src/yb/tserver/tablet_service.cc" source_line: 716 errors: "\000" } } failed_index_ids: "95f1a0ca84eb4bd9b922f9edc41d5525" [m-1] I1213 17:37:46.736519 1898082304 backfill_index.cc:1331] Failed to backfill the tablet 0x00000001248da800 -> 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]): Invalid argument (yb/tserver/tablet_service.cc:716): Index 95f1a0 ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 ``` Additionally, prior to this revision we don't seem to be populating `failed_indexes` with the id of the missing index. This causes the whole batch of indexes to be marked as "failed". We ensure that failed_indexes is populated correctly, so that only the index which is not found in the IndexMap is marked as failed and the remaining indexes backfill to success. Jira: DB-9124, DB-9003 Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.DeleteIndexWhileBackfilling Reviewers: rthallam, jason, arybochkin Reviewed By: jason Subscribers: bogdan, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31590
…ring index backfill Summary: Original commit: 0c7dd15 / D30865 Our stress tests commonly delete the index even before the index is backfilled. The error message logged is confusing and not very helpful. Simplifying the returned error code/message. Previously: ``` E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } } . . . . columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}] }} ``` After the change: ``` [m-1] W1213 17:37:46.736081 1898082304 backfill_index.cc:1582] TS 0127865cf9154dd8aace4611aaefe502: backfill failed for tablet 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]) no further retry: Invalid argument (yb/tserver/tablet_ service.cc:716): Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 response was error { code: OPERATION_NOT_SUPPORTED status { code: INVALID_ARGUMENT message: "Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19" source_file: "../../src/yb/tserver/tablet_service.cc" source_line: 716 errors: "\000" } } failed_index_ids: "95f1a0ca84eb4bd9b922f9edc41d5525" [m-1] I1213 17:37:46.736519 1898082304 backfill_index.cc:1331] Failed to backfill the tablet 0x00000001248da800 -> 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]): Invalid argument (yb/tserver/tablet_service.cc:716): Index 95f1a0 ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 ``` Additionally, prior to this revision we don't seem to be populating `failed_indexes` with the id of the missing index. This causes the whole batch of indexes to be marked as "failed". We ensure that failed_indexes is populated correctly, so that only the index which is not found in the IndexMap is marked as failed and the remaining indexes backfill to success. Jira: DB-9124, DB-9003 Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.DeleteIndexWhileBackfilling Reviewers: rthallam, jason, arybochkin Reviewed By: jason Subscribers: ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31591
…ring index backfill Summary: Original commit: 0c7dd15 / D30865 Our stress tests commonly delete the index even before the index is backfilled. The error message logged is confusing and not very helpful. Simplifying the returned error code/message. Previously: ``` E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } } . . . . columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}] }} ``` After the change: ``` [m-1] W1213 17:37:46.736081 1898082304 backfill_index.cc:1582] TS 0127865cf9154dd8aace4611aaefe502: backfill failed for tablet 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]) no further retry: Invalid argument (yb/tserver/tablet_ service.cc:716): Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 response was error { code: OPERATION_NOT_SUPPORTED status { code: INVALID_ARGUMENT message: "Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19" source_file: "../../src/yb/tserver/tablet_service.cc" source_line: 716 errors: "\000" } } failed_index_ids: "95f1a0ca84eb4bd9b922f9edc41d5525" [m-1] I1213 17:37:46.736519 1898082304 backfill_index.cc:1331] Failed to backfill the tablet 0x00000001248da800 -> 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]): Invalid argument (yb/tserver/tablet_service.cc:716): Index 95f1a0 ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 ``` Additionally, prior to this revision we don't seem to be populating `failed_indexes` with the id of the missing index. This causes the whole batch of indexes to be marked as "failed". We ensure that failed_indexes is populated correctly, so that only the index which is not found in the IndexMap is marked as failed and the remaining indexes backfill to success. Jira: DB-9124, DB-9003 Additional changes for porting to 2.14: Also pulling in the required changes to support `TEST_block_do_backfill` as required for testing. Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.DeleteIndexWhileBackfilling Reviewers: rthallam, jason, arybochkin Reviewed By: jason Subscribers: ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D31593
Jira Link: DB-9003
Description
Tried on version: 2.21.0.0-b227
Note: This issue started occurring from a build between 2.21.0.0-b216 and 2.21.0.0-b227.
While executing a heavy Workload on a table if we create and drop indexes frequently then it may cause timeout issue for drop index.
Also, the following was seen on Master error log:
Note: Added the whole error in the comment
E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } } . . . . columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}]
Test Details:
Find all the logs(Attachments -> http://stress.dev.yugabyte.com/stress_test/a6b1a8b9-0e5d-4e3f-b49d-bcada682ca87)
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: