-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YSQL: insertsWhileCreatingIndex sometimes fails with mismatching row counts #7376
Comments
Attached logs of the most recent (local) failure with some logs added: org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex_2021-09-02T21_10_58.zip
|
Thanks to detective, I found the commit that made the test flaky: ccb580a. Revert that commit with simple merge conflict resolution, and test does not fail with this error message. Before reverting, test fails almost always on centos-clang11-fastdebug. Before reverting, I looked into the logs to see why a row goes missing. I put logs before and after the INSERT, and I saw, for that missing row, the "before" log happen before indisready permission is granted and the "after" log happen after SafeTime is chosen. Then, the insert probably happened at a time after the SafeTime but didn't write to the index since it was initiated before indisready=true, meaning it didn't have permissions to write to the index. This should be investigated deeper. It looks very similar to the situation theorized in #6238 (comment). After revert, the test still fails sometimes, but they all have different error message:
For those failed tests, they seem to have the same issue of an INSERT lasting before indisready=true and after SafeTime is chosen. So the root cause may be unrelated to that commit ccb580a and rather closer to issue #6238 as I mention earlier. |
Here is the revert merge resolution:
|
Ran
This was a run after having reverted commit ccb580a. The test fails with "Resource unavailable: null" on row 457, and I don't know what the connection, but it seems to only fail like this when it would have otherwise failed with the mismatching row counts issue. So I lean towards that commit not being related to the underlying issue. The apparent issue is that the INSERT process thinks indisready=false after the CREATE INDEX process moves on to do getsafetime+backfill (referred to as the backfill phase). The backfill phase should not be initiated until all nodes have updated indisready to true. I believe that this is in the domain of issue #6238. Two possible variants are
For users, some easy ways to better avoid this for now are
|
The first issue is #6238: CREATE INDEX does not reliably wait for index permissions to be communicated to all nodes before moving on to the next permission. It currently makes an effort using a 1s sleep. However, even if that issue is fixed, there is a second issue of handling long-running processes that end up having an outdated view of permissions. During an INSERT involving index, How about catalog version mismatch? First, the breaking catalog version can be more than one away from the catalog version (I previously had thoughts that assumed it won't be). Even if at-most-one-away were enforced, the backfill step doesn't involve a postgres system table change, so we wouldn't be able to use catalog version mismatch to cause error on a node that performs INSERT with catalog version at indislive=t,indisready=f during backfill: the catalog version during backfill would be the same as that when enabling indisready (indislive=t,indisready=t), one catalog version away (assuming no other DDLs happened on the table). How about backfill safe time transaction fence? If a postgres process for INSERT is very slow and has only checked the index is not ready for inserts but hasn't started a transaction for write, it wouldn't be killed by the safe time acquisition. Then, when the write finally happens, it may get a write time after the safe time and occur without index write permissions. cc: @mbautin |
Digging deeper, I found that clang11 + fastdebug + centos7 tends to take 0.8-1 second to load the pg_attribute cache, but gcc + fastdebug + centos only takes like 0.2 seconds. This path is taken: A 1 second delay makes it easy for the INSERT postgres thread to fall behind. I used the recently added diff --git i/src/postgres/src/backend/executor/execIndexing.c w/src/postgres/src/backend/executor/execIndexing.c
index 3be77506c1..12154312f6 100644
--- i/src/postgres/src/backend/executor/execIndexing.c
+++ w/src/postgres/src/backend/executor/execIndexing.c
@@ -197,6 +197,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
IndexInfo *ii;
indexDesc = index_open(indexOid, RowExclusiveLock);
+ elog(LOG, "opened index %u", indexOid);
/* extract index key information from the index's pg_index info */
ii = BuildIndexInfo(indexDesc);
diff --git i/src/yb/yql/pggate/pg_txn_manager.cc w/src/yb/yql/pggate/pg_txn_manager.cc
index 268ef543e3..68c948789d 100644
--- i/src/yb/yql/pggate/pg_txn_manager.cc
+++ w/src/yb/yql/pggate/pg_txn_manager.cc
@@ -332,6 +332,7 @@ Status PgTxnManager::BeginWriteTransactionIfNecessary(bool read_only_op,
} else {
txn_ = std::make_shared<YBTransaction>(GetOrCreateTransactionManager());
}
+ LOG(WARNING) << "txn taken";
txn_->SetPriority(GetPriority(NeedsPessimisticLocking(needs_pessimistic_locking)));
When running
29836 is INSERT; 29835 is CREATE INDEX. Notice that the INSERT thread keeps thinking catalog version is the outdated 3 (until the end of the statement). It opens the index with this outdated catalog, so it thinks it doesn't have indisready=true, so it doesn't send a write to the index. Finally, the transaction appears to be taken after the safe time, so the backfill transaction fence doesn't catch this. There's also the heartbeat problem. If heartbeat interval is raised, the issue becomes much easier to hit: diff --git i/java/yb-pgsql/src/test/java/org/yb/pgsql/TestIndexBackfill.java w/java/yb-pgsql/src/test/java/org/yb/pgsql/TestIndexBackfill.java
index f29ae6986f..a3acb61721 100644
--- i/java/yb-pgsql/src/test/java/org/yb/pgsql/TestIndexBackfill.java
+++ w/java/yb-pgsql/src/test/java/org/yb/pgsql/TestIndexBackfill.java
@@ -56,6 +56,7 @@ public class TestIndexBackfill extends BasePgSQLTest {
protected Map<String, String> getTServerFlags() {
Map<String, String> flagMap = super.getTServerFlags();
flagMap.put("ysql_disable_index_backfill", "false");
+ flagMap.put("heartbeat_interval_ms", "3000");
return flagMap;
}
|
One way to improve the correctness issue is to mark the index permission updates breaking catalog changes. That way, we eliminate the problem of postgres using an outdated catalog (with respect to tserver), but tserver may still lag behind master due to heartbeats. In testing, heartbeats should be reliable since master and tserver are on the same node. So, assuming the heartbeat lag issue is nonexistent, there should be no correctness issue. However, the test will still fail due to catalog version mismatch that fails also on retry. On clang11 + fastdebug + centos7, I got 3/10 fail all due to a retried catalog version failing. They all were due to the updates indislive=true and indisready=true. The first causes cache refresh, but like I mention previously, this refresh takes about a second, so during the refresh, the indisready=true catalog change is pushed, so the retry hits another catalog version mismatch. Since there's only one retry, the INSERT fails, and the test fails. Ideally, we want to avoid the catalog version mismatches (in general), otherwise, this isn't really an online schema change. That's why breaking version was used in the first place. Possibly, we can keep breaking version and focus on making sure the permission changes don't occur until we are certain any other clients/tservers have the new version. |
So an idea to make sure permission changes each wait until they are fully propagated goes as follows: postgres asks master to ask all tservers to ask all local postgres backends whether they have the new catalog (e.g. catalog version >= x). It gets tricky when tservers are unresponsive and/or there's network partitioning. We can err on the side of caution and fail fast when some tserver is unresponsive. Depending on the setup, it is possible to still break correctness. I detail the repro below. On the current code (commit f49d18a), I successfully got the following to happen:
(I will detail it later.) Theoretically, the following can happen if
The danger with this second example is that it could happen even if we implement the permission changes waiting: indislive will have both tservers alive, indisready onwards will have only ts1 alive and ts2 marked by master as dead, so master won't even attempt to check on ts2. However, it is difficult to hit since this assumes the time between index state flag updates is larger than the time for a tserver to get marked dead by master. The example bumped up One idea to fix this is for postgres to act up if its parent tserver is not successfully communicating heartbeats with master. In the examples, it was cut off from master for relatively a long time, but it was still able to do writes to other tservers. But I have a feeling that this doesn't squarely solve the problem. Another idea is to have tservers use Here are the details for the first example I got working: bin/yb-ctl start \
--master_flags enable_load_balancing=false,placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1a,transaction_table_num_tablets=1 \
--tserver_flags placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1a,TEST_slowdown_backfill_by_ms=15000,ysql_pg_conf_csv=yb_index_state_flags_update_delay=45000
bin/yb-ctl add_node \
--tserver_flags placement_cloud=aws,placement_region=us-east-1,placement_zone=us-east-1b,TEST_slowdown_backfill_by_ms=15000,ysql_pg_conf_csv=yb_index_state_flags_update_delay=45000 p1: CREATE TABLESPACE us_east_1a_zone_tablespace
WITH (replica_placement='{"num_replicas": 1, "placement_blocks": [
{"cloud":"aws","region":"us-east-1","zone":"us-east-1a","min_num_replicas":1}]}');
CREATE TABLE abc (i int, j int, PRIMARY KEY (i asc)) TABLESPACE us_east_1a_zone_tablespace;
CREATE INDEX ON abc (j asc) TABLESPACE us_east_1a_zone_tablespace; p2: INSERT INTO abc VALUES (1, 2);
INSERT INTO abc VALUES (3, 4);
INSERT INTO abc VALUES (5, 6);
INSERT INTO abc VALUES (7, 8);
INSERT INTO abc VALUES (9, 10); cutting connection: # cut connection
sudo iptables -I INPUT -p tcp --dport 7100 -s 127.0.0.2 -d 127.0.0.1 -i lo -j DROP
# restore connection
sudo iptables -D INPUT -p tcp --dport 7100 -s 127.0.0.2 -d 127.0.0.1 -i lo -j DROP proving the inconsistency:
|
Summary: Java test TestIndexBackfill#insertsWhileCreatingIndex fails for two reasons: - some transient failure messages are not caught/allowed - index state changes may not properly wait Fix the first cause: add more messages that are allowed. This should reduce the test flakiness but not fully fix it. Leave issue #7376 open. Test Plan: Jenkins: test regex: .*TestIndexBackfill#insertsWhileCreatingIndex Ensure the following only fails with "Mismatch at row" errors (which are for the index state change issue): ./yb_build.sh fastdebug --gcc11 \ --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' \ -n 1000 --tp 1 Reviewers: alex Reviewed By: alex Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D20329
Summary: Java test TestIndexBackfill#insertsWhileCreatingIndex fails for two reasons: - some transient failure messages are not caught/allowed - index state changes may not properly wait Fix the first cause: add more messages that are allowed. This should reduce the test flakiness but not fully fix it. Leave issue #7376 open. Original commit: 90fa310 / D20329 Test Plan: Jenkins: test regex: .*TestIndexBackfill#insertsWhileCreatingIndex Ensure the following only fails with "Mismatch at row" errors (which are for the index state change issue): ./yb_build.sh fastdebug --gcc11 \ --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' \ -n 1000 --tp 1 Reviewers: alex Reviewed By: alex Differential Revision: https://phabricator.dev.yugabyte.com/D20580
Summary: Java test TestIndexBackfill#insertsWhileCreatingIndex fails for two reasons: - some transient failure messages are not caught/allowed - index state changes may not properly wait Fix the first cause: add more messages that are allowed. This should reduce the test flakiness but not fully fix it. Leave issue #7376 open. Original commit: 90fa310 / D20329 Test Plan: Jenkins: test regex: .*TestIndexBackfill#insertsWhileCreatingIndex Ensure the following only fails with "Mismatch at row" errors (which are for the index state change issue): ./yb_build.sh fastdebug --gcc11 \ --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' \ -n 1000 --tp 1 Reviewers: alex Reviewed By: alex Differential Revision: https://phabricator.dev.yugabyte.com/D20637
Summary: Java test TestIndexBackfill#insertsWhileCreatingIndex fails for two reasons: - some transient failure messages are not caught/allowed - index state changes may not properly wait Fix the first cause: add more messages that are allowed. This should reduce the test flakiness but not fully fix it. Leave issue #7376 open. Original commit: 90fa310 / D20329 Test Plan: Jenkins: test regex: .*TestIndexBackfill#insertsWhileCreatingIndex Ensure the following only fails with "Mismatch at row" errors (which are for the index state change issue): ./yb_build.sh fastdebug --gcc11 \ --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' \ -n 1000 --tp 1 Reviewers: alex Reviewed By: alex Differential Revision: https://phabricator.dev.yugabyte.com/D20636
Summary: Java test TestIndexBackfill#insertsWhileCreatingIndex fails for two reasons: - some transient failure messages are not caught/allowed - index state changes may not properly wait Fix the first cause: add more messages that are allowed. This should reduce the test flakiness but not fully fix it. Leave issue #7376 open. Original commit: 90fa310 / D20329 Test Plan: Jenkins: test regex: .*TestIndexBackfill#insertsWhileCreatingIndex Ensure the following only fails with "Mismatch at row" errors (which are for the index state change issue): ./yb_build.sh fastdebug --gcc11 \ --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' \ -n 1000 --tp 1 Reviewers: alex Reviewed By: alex Differential Revision: https://phabricator.dev.yugabyte.com/D20635
Summary: Online schema changes need a way to ensure all clients (e.g. postgres backends) are up to date before moving on to the next schema. One way to accomplish that is by having the schema changer ask all the backends whether they have the desired schema, or more specifically, the desired catalog version. Add a way to get this information out through a new pgstat function yb_pg_stat_get_backend_catalog_version that returns the given backend's catalog version or null if it isn't holding onto a fixed catalog snapshot. Also add this as a column to pg_stat_activity. Note that this information is readable to all users. A followup can use that information to solve issue #7376. Test Plan: ./yb_build.sh --java-test TestPgRegressPgStat Reviewers: rskannan, myang Reviewed By: myang Subscribers: alex, rskannan, smishra, yql Differential Revision: https://phabricator.dev.yugabyte.com/D15281
Summary: As of commit a1729c3 titled [#7376] ysql: fix state waits for online CREATE INDEX CREATE INDEX may wait indefinitely on backends to catch up. It is easy to not know what is taking the CREATE INDEX so long. In a situation where the CREATE INDEX DDL contends with DMLs, some options are A. have CREATE INDEX wait indefinitely until the old DMLs finish B. have CREATE INDEX die after timeout C. have CREATE INDEX wound the old DMLs Change the default behavior from A to B, where timeout is five minutes (hopefully, that is enough time for transactions to finish). Let the timeout be configurable using per-session GUC variable yb_wait_for_backends_catalog_version_timeout and corresponding PG flag ysql_yb_wait_for_backends_catalog_version_timeout. To get the old A behavior, set the value to zero or negative, which is interpreted as no timeout. The benefit of timing out the CREATE INDEX is that it gives a chance to return an error message with details about what is going on. Close: #17134 Jira Issue(s): DB-6416 Test Plan: ./yb_build.sh fastdebug --gcc11 \ --cxx-test pgwrapper_pg_backends-test \ --gtest_filter PgBackendsTest.PgTimeout Reviewers: hbhanawat, sshaikh, myang Reviewed By: myang Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D24932
Summary: Commit a1729c3 titled [#7376] ysql: fix state waits for online CREATE INDEX fixes CREATE INDEX state waits by adding WaitForBackendsCatalogVersion RPCs that fan out to tservers. It was not made rolling upgrade safe because DDLs are normally not supported during rolling upgrade. However, it is nice to still make a best effort to support the case, so handle it by ignoring the version check when the following two cases happen: - tservers are on old version: then they don't recognize the RPC. - YSQL upgrade did not run: if the src->dst src version is too old, it may not have the column "catalog_version" in pg_stat_activity. One minor difference is that, previously, it had a hard-coded 1s sleep. With the ignore logic, it may delay for less than that, but this likely won't be the case in a real-world setting where RPC latencies are decently high. Also add a missing space in an existing log message. Jira: DB-7155 Test Plan: 1. Do release build: ./yb_build.sh release. 1. Get an old release: curl "https://downloads.yugabyte.com/releases/2.16.4.0/yugabyte-2.16.4.0-b32-linux-x86_64.tar.gz" \ | tar xzv -C /tmp 1. Run the following script passing in that dir: ./script.sh /tmp/yugabyte-2.16.4.0 where #!/usr/bin/env bash # script.sh set -euxo pipefail if [ $# -ne 1 ]; then echo "Missing argument path to old release" >&2 exit 1 fi OLD_RELEASE_PATH=$1 "$OLD_RELEASE_PATH"/bin/yb-ctl destroy "$OLD_RELEASE_PATH"/bin/yb-ctl create --rf 3 "$OLD_RELEASE_PATH"/bin/ysqlsh -c 'CREATE TABLE t (i int)' YB_ADMIN_CMD_PREFIX=( build/latest/bin/yb-admin "--master_addresses=127.0.0.1,127.0.0.2,127.0.0.3" ) function test_case() { master_idx=$1 num_tservers_not_migrated=$2 num_tservers_old_version=$3 # Use fresh master for each test case in order to get a fresh set of logs. master_uuid=$("${YB_ADMIN_CMD_PREFIX[@]}" list_all_masters \ | sed "$((master_idx + 1))q;d" \ | awk '{print$1}') "${YB_ADMIN_CMD_PREFIX[@]}" master_leader_stepdown "$master_uuid" "${YB_ADMIN_CMD_PREFIX[@]}" list_all_masters \ | sed "$((master_idx + 1))q;d" \ | grep -q LEADER sleep 5 bin/ysqlsh -c 'CREATE INDEX ON t (i)' # 2 backends catalog version requests per CREATE INDEX, hence x2. log_file=~/yugabyte-data/node-"$master_idx"/disk-1/yb-data/master/logs/yb-master.INFO test "$(grep -c "but ignoring for backwards compatibility" "$log_file")" \ -eq $((num_tservers_not_migrated * 2)) test "$(grep -c "is on an older version" "$log_file")" \ -eq $((num_tservers_old_version * 2)) } bin/yb-ctl restart_node 1 --master bin/yb-ctl restart_node 2 --master bin/yb-ctl restart_node 3 --master # Test case: not all tservers upgraded. # - 1 tserver: up-to-date version, but YSQL upgrade did not happen. # - 2 tservers: old version. bin/yb-ctl restart_node 1 test_case 1 1 2 # Test case: all tservers upgraded, but ysql upgrade not run. bin/yb-ctl restart_node 2 bin/yb-ctl restart_node 3 test_case 2 3 0 # Test case: everything upgraded. "${YB_ADMIN_CMD_PREFIX[@]}" upgrade_ysql test_case 3 0 0 Close: #18124 Reviewers: skedia Reviewed By: skedia Subscribers: ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D26727
Summary: Online index backfill involves a step where it gets a safe time used for backfill and aborts any transactions that started before that time. The code comments for why that is done seem to be targeted for YCQL when it is not guaranteed that these transactions are aware of the write permission for the index. In YSQL, especially after commit a1729c3 titled [#7376] ysql: fix state waits for online CREATE INDEX it is guaranteed that any transactions that started before the safe time do not have an outdated view of the index permissions. Avoid aborting those transactions to make online create index more truly online. In the process of making the test for this, add - PGConn::Reset, a wrapper around PQreset - PgIndexBackfillTest::WaitForIndexStateFlags - PgIndexBackfillTest::WaitForIndexScan Jira: DB-7826 Test Plan: ./yb_build.sh fastdebug --gcc11 \ --cxx-test pgwrapper_pg_index_backfill-test Close: #18999 Reviewers: dsrinivasan, amitanand, hsunder Reviewed By: amitanand Subscribers: ybase, hsunder, yql, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D28090
Jira Link: DB-1795
ybd asan --java-test 'org.yb.pgsql.TestIndexBackfill#insertsWhileCreatingIndex' :
The text was updated successfully, but these errors were encountered: