Skip to content

Commit

Permalink
[BACKPORT 2.18][#20178, #20041] Docdb: Handle deleted index better du…
Browse files Browse the repository at this point in the history
…ring index backfill

Summary:
Original commit: 0c7dd15 / D30865
Our stress tests commonly delete the index even before the
index is backfilled. The error message logged is confusing and
not very helpful.

Simplifying the returned error code/message.

Previously:

```
E1117 04:24:12.662847 80635 backfill_index.cc:986] Backfill Index Table(s) { test_indexes_034c74byvaluev4_idx4 } failed to backfill the index: [e718ede41a574e94bb855c1e0a321325] due to Invalid argument (yb/tserver/tablet_service.cc:735): Tablet has a different schema 595 vs 591. Requested index is not ready to backfill. IndexMap: 0x000035913cdb1e58 -> [{955035f2fa10421c9d9d379303bd95c7, table_id: "955035f2fa10421c9d9d379303bd95c7" version: 0 is_local: false columns { column_id: 0 indexed_column_id: 2 column_name: "C$_v2" colexpr { column_id: 2 } }
.
.
.
.
columns { column_id: 1 indexed_column_id: 0 column_name: "C$_k" colexpr { column_id: 0 } } hash_column_count: 1 range_column_count: 1 is_unique: false indexed_table_id: "68214fdb2bd2484095f782959aca7482" indexed_hash_column_ids: 0 use_mangled_column_name: true index_permissions: INDEX_PERM_READ_WRITE_AND_DELETE backfill_error_message: "" num_rows_processed_by_backfill_job: 1461853}]
}}
```

After the change:

```
[m-1] W1213 17:37:46.736081 1898082304 backfill_index.cc:1582] TS 0127865cf9154dd8aace4611aaefe502: backfill failed for tablet 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]) no further retry: Invalid argument (yb/tserver/tablet_
service.cc:716): Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19 response was error { code: OPERATION_NOT_SUPPORTED status { code: INVALID_ARGUMENT message: "Index 95f1a0ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema
 is 19" source_file: "../../src/yb/tserver/tablet_service.cc" source_line: 716 errors: "\000" } } failed_index_ids: "95f1a0ca84eb4bd9b922f9edc41d5525"
[m-1] I1213 17:37:46.736519 1898082304 backfill_index.cc:1331] Failed to backfill the tablet 0x00000001248da800 -> 6b2b0b9c6bd942feb971ed47ddbf310e (table test_table [id=9eef1eda4e9d4d44bab7b2142a4abff9]): Invalid argument (yb/tserver/tablet_service.cc:716): Index 95f1a0
ca84eb4bd9b922f9edc41d5525 not found in index_map. Current schema is 19
```

Additionally, prior to this revision we don't seem to be populating `failed_indexes` with the id of the missing index. This causes the whole batch of indexes to be marked as "failed". We ensure that failed_indexes is populated correctly, so that only the index which is not found in the IndexMap is marked as failed and the remaining indexes backfill to success.
Jira: DB-9124, DB-9003

Test Plan: ybd --cxx-test cassandra_cpp_driver-test --gtest_filter CppCassandraDriverTest.DeleteIndexWhileBackfilling

Reviewers: rthallam, jason, arybochkin

Reviewed By: jason

Subscribers: ybase, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31591
  • Loading branch information
amitanandaiyer committed Jan 16, 2024
1 parent 5ffafda commit ac2af53
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 5 deletions.
2 changes: 2 additions & 0 deletions src/yb/integration-tests/cassandra_cpp_driver-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2007,6 +2007,7 @@ TEST_F_EX(
ASSERT_OK(table.CreateTable(&session_, "test.test_table", {"k", "v"}, {"(k)"}, true));

LOG(INFO) << "Creating two indexes that will backfill together";
ASSERT_OK(cluster_->SetFlagOnMasters("TEST_block_do_backfill", "true"));
// Create 2 indexes that backfill together. One of them will be deleted while the backfill
// is happening. The deleted index should be successfully deleted, and the other index will
// be successfully backfilled.
Expand All @@ -2029,6 +2030,7 @@ TEST_F_EX(
ASSERT_OK(session_.ExecuteQuery("drop index test_table_index_by_v1"));

// Wait for the backfill to actually run to completion/failure.
ASSERT_OK(cluster_->SetFlagOnMasters("TEST_block_do_backfill", "false"));
SleepFor(MonoDelta::FromSeconds(10));
res = client_->WaitUntilIndexPermissionsAtLeast(
table_name, index_table_name1, IndexPermissions::INDEX_PERM_NOT_USED, 50ms /* max_wait */);
Expand Down
17 changes: 12 additions & 5 deletions src/yb/tserver/tablet_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,8 @@ void TabletServiceAdminImpl::BackfillIndex(
return;
}

const uint32_t our_schema_version = tablet.peer->tablet_metadata()->schema_version();
const uint32_t their_schema_version = req->schema_version();
bool all_at_backfill = true;
bool all_past_backfill = true;
bool is_pg_table = tablet.tablet->table_type() == TableType::PGSQL_TABLE_TYPE;
Expand Down Expand Up @@ -667,9 +669,16 @@ void TabletServiceAdminImpl::BackfillIndex(
all_past_backfill &=
idx_info_pb.index_permissions() > IndexPermissions::INDEX_PERM_DO_BACKFILL;
} else {
LOG(WARNING) << "index " << idx.table_id() << " not found in tablet metadata";
all_at_backfill = false;
all_past_backfill = false;
const auto& index_table_id = idx.table_id();
LOG(INFO) << "index " << index_table_id << " not found in tablet metadata";
*resp->add_failed_index_ids() = index_table_id;
SetupErrorAndRespond(
resp->mutable_error(),
STATUS_SUBSTITUTE(
InvalidArgument, "Index $0 not found in index_map. Current schema is $1",
index_table_id, our_schema_version),
TabletServerErrorPB::OPERATION_NOT_SUPPORTED, &context);
return;
}
}

Expand All @@ -686,8 +695,6 @@ void TabletServiceAdminImpl::BackfillIndex(
return;
}

uint32_t our_schema_version = tablet.peer->tablet_metadata()->schema_version();
uint32_t their_schema_version = req->schema_version();
DCHECK_NE(our_schema_version, their_schema_version);
SetupErrorAndRespond(
resp->mutable_error(),
Expand Down

0 comments on commit ac2af53

Please sign in to comment.