release-23.1: merge v23.1.9 tag into release branch #110246

dt · 2023-09-08T12:57:00Z

v23.1.9 was tagged from a sub-branch off of release-23.1; this merges it back into the release branch so it appears in the linage of subsequent commits.

Epic: none.
Release justification: noop.

Fixes cockroachdb#107352. We allow rollbacks to be sent on a txn at any time. Before this change, a rollback that encountered a COMMITTED txn would loudly log an error to the log. This commit removes this logging in such cases, retaining it only for unexpected cases. Release note: None

As observed in cockroachdb#108348, a test failed because it timed out reading a row. Yet, this flake could not be reproduced in over 50k runs. Bump timeout period to make this flake even less likely. Fixes cockroachdb#108348 Release note: None

The lease preferences roachtest could occasionally fail, if the liveness leaseholder were on a stopped node. We should address this issue, for now, pin the liveness lease to a live node to prevent flakes. Informs: cockroachdb#108512 Resolves: cockroachdb#108425 Release note: None

This commit removes the datadriven.RunTest call from TestUnavailableZip, as this unit test is meant to test the ability of debug zip to output while nodes are unavailable, and not meant to assert the entire output is deterministic. The commit also refactors the test cases to test output for a cluster with a partially unavailable node and a fully unavailable node. In the first case, we block replica requests from the node to table data, simulating SQL unavailability. In the second case, we block all replica requests from the node, extending the unavailability to liveness and status endpoints. To ensure the test doesn't take too long, we reduce the debug zip timeout from 0.5s to 0.25s. Fixes cockroachdb#106961. Release note: None

….9-rc-107199 release-23.1.9-rc: cli: simplify output assertion in TestUnavailableZip

`TestReplicateQueueLeasePreferencePurgatoryError` waits for every lease to be on one store, before changing the lease preference, and asserting the leaseholders end up in purgatory. The test could occasionally time out before every lease was on the initial store. The test forced replicate queue processing to speed this up, however only on one store, when all stores may need to transfer lease(s). Force replication scan and process on every store, instead of just the initially preferred node. This speeds up this stage of the test. Fixes: cockroachdb#108624 Release note: None

Before, the TestSQLStatsRegions test failed for secondary tenant caused by the split and lease transfer (for lease preferences) because we had to wait for the test table range(s) split, for each regional row, and each range's lease moved to the row's primary region. With introduced change, we check that at least one leaseholder is present in every region, if not force enqueue split for all ranges of test table. Release note: None

…108634 release-23.1.9-rc: sql: fix TestSQLStatsRegions test

protobuf.js sets the default value of Long types to 0, instead of Long.fromInt(0). This is a workaround until we have a more comprehensive solution. See https://github.com/cockroachdb/cockroach/blob/165960a9e0588e7046da44a4a4a45b78b9d4541f/pkg/ui/workspaces/cluster-ui/src/util/fixLong.ts#L13 Resolves cockroachdb#107905 Release note (bug fix): Fixes errors on the Sessions page UI when a session's memory usage is zero bytes.

…onumber-fix ui: fix occurrence of x.toNumber is not a function

logtags.AddTag calls context.WithValue which creates a new context. Further, here we were re-assigning this new context to the original context passed to the Resume function. As a result, I believe that we were ending up with a constantly growing chain of contexts. We believe this is the cause of slowly growing CPU usage. Release note (bug fix): Fix bug that could cause CPU usage to increase over time.

…ort-release-23.1.9-rc-108703

Previously, the cross-joiner wouldn't advance its internal state when its left input projected no columns. This would result in an infinite loop as the right rows were repeatedly emitted. This patch advances the state when there are no left columns as if values from the left side were emitted. Fixes cockroachdb#105882 Release note (bug fix): Fixed a bug introduced in v22.1 that could cause a join to infinite-loop in rare cases when (1) the join filter is not an equality and (2) no columns from the left input are returned.

Replicas were enqueued into the replicate queue, upon the store receiving a span config update which could affect the replica. The replicate queue shouldQueue is relatively more expensive than other queues. Introduce the cluster setting kv.enqueue_in_replicate_queue_on_span_config_update.enabled, which when set to true, enables queuing up replicas on span config updates; when set to false, disables queuing replicas on span config updates. By default, this settings is set to false. Resolves: cockroachdb#108724 Release note (ops change): Introduce the kv.enqueue_in_replicate_queue_on_span_config_update.enabled cluster setting. When set to true, stores in the cluster will enqueue replicas for replication changes, upon receiving config updates which could affect the replica. This setting is off by default. Enabling this setting speeds up how quickly config triggered replication changes begin, but adds additional CPU overhead. The overhead scales with the number of leaseholders.

…8725

…rc-105931 release-23.1.9-rc: colexec: don't infinite loop cross-join with zero-column left input

Five years ago, in cockroachdb#26881, we changed import to retry on worker failures, which made imports much more resilient to transient failures like nodes going down. As part of this work we created `TestImportWorkerFailure` which shuts down one node during an import, and checks that the import succeeded. Unfortunately, this test was checked-in skipped, because though imports were much more resilient to node failures, they were not completely resilient in every possible scenario, making the test flakey. Two months ago, in cockroachdb#105712, we unskipped this test and discovered that in some cases the import statement succeeded but only imported a partial dataset. This non-atomicity seems like a bigger issue than whether the import is able to succeed in every possible transient failure scenario, and is tracked separately in cockroachdb#108547. This PR changes `TestImportWorkerFailure` to remove successful import as a necessary condition for test success. Instead, the test now only checks whether the import was atomic; that is, whether a successful import imported all data or a failed import imported none. This is more in line with what we can guarantee about imports today. Fixes: cockroachdb#102839 Release note: None

An issue in roachtest cockroachdb#108530 prevents clean test termination when calling Wait() on a test monitor that did not have at least 1 task started. This cause `cdc/kafka-oauth` test to hang. Add a '30s' duration to the tpcc task to go around this problem. Fixes cockroachdb#108507 Release note: None

…108626 release-23.1.9-rc: importer: only check import *atomicity* in TestImportWorkerFailure

Previously, the `CheckKeyCountIncludingTombstoned` call in TestDropTableWhileUpgradingFormat flaked with a key count mismatch error (expected: 100, actual: 0). This was odd because this key counting failure was directly after rows were committed. To address this, this patch adds a retry for the `CheckKeyCountIncludingTombstoned` logic in case this is due to a race condition. Epic: none Fixes: cockroachdb#108340 Release note (sql change): deflake TestDropTableWhileUpgradingFormat

…1.9-rc-backport-kafka-oauth release 23.1.9 rc: roachtest: Ensure tpcc workloads runs for a bit

…eached This commit introduces the cluster setting `sql.stats.limit_table_size.enabled` which controls whether or not we enforce the row limit set by `sql.stats.persited_rows.max` in the system.{statement,transaction} statistics tables. Previously, when the sql stats tables reach the row limit set by `sql.stats.persisted_rows.max`, we disable flushing any more stats to the tables. This restriction involved issuing a query during the flush process to check the number of rows currently in the tables. This query can often contend with the compaction job queries and so we should allow a method to bypass this check to alleviate the contention. Furhtermore, in some cases we'd like to allow the system tables to grow in order to not have gaps in sql stats observability. In these cases we can attempt to modify the compaction settings to allow the job to catch up to the target number of rows. Fixes: cockroachdb#108771 Release note (sql change): This commit introduces the cluster setting `sql.stats.limit_table_size.enabled` which controls whether or not we enforce the row limit set by `sql.stats.persited_rows.max` in the system.{statement,transaction} statistics tables.

This change adds option env vars which can be passed to `tests.Chaos` that are applied to nodes that are restarted. Release note: None Epic: None

Previously, this roachtest could fail due to slow retries causing the changefeed to make no progress. See cockroachdb#108896 (comment) for more info. This tests sets `COCKROACH_CHANGEFEED_TESTING_FAST_RETRY` to avoid slow retries, but this environment variable is unset when nodes restart. This change ensures it is preserved. Release note: None Epic: None Informs: cockroachdb#108896

…108835 release-23.1.9-rc: persistedsqlstats: add cluster setting to enable flush on row limit r…

Both of these tests are being worked on upstream to resolve the flakiness. Since CockroachDB itself is not the cause of the flakes, we ignore them so that these tests don't create unnecessary issues. Release note: None

…ort-release-23.1.9-rc-108939 release-23.1.9-rc: roachtest: ignore upstream flakes in activerecord and sequelize

CREATE USER and ALTER USER create schema change jobs in order to bump the descriptor version on the users and role_options tables in order to refresh the authentication cache. Adding these users in a single transaction means we only need to wait for 3 such jobs rather than 3000 jobs. This is particularly useful since we occasionally observe massive slow downs in processing these jobs. While we should get to the bottom of such slow downs, let's not slow down CI. Supersedes cockroachdb#81276 Epic: none Release note: None

In recent CI runs, we've been seeing TestFullClusterBackup fail with: full_cluster_backup_restore_test.go:143: error executing 'COMMIT': pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_ABORTED_RECORD_FOUND): "sql txn" meta={id=ec1de683 key=/Table/4/1/"maxroach0"/0 iso=Serializable pri=0.03058324 epo=0 ts=1686254742.907452755,1 min=1686254688.329813751,0 seq=8023} lock=true stat=ABORTED rts=1686254742.907452755,1 wto=false gul=1686254688.829813751,0 int=2006 This looks to be the result of a recent change to issue all of the CREATE USER statements in a single transaction. Since explicit transactions are only automatically retried in specific circumstances, we now have to deal with the retry error. Here, we introduce a small helper that retries anything that looks like a restart error. Note that I haven't been able to reproduce the above locally as the test is very heavyweight and difficult to stress. Epic: none Release note: None

In CI we've seen this test fail with: backup_test.go:670: error scanning '&{<nil> 0xc019c5b580}': pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=77054b51 key=/Table/121/1 pri=0.03587869 epo=0 ts=1689916873.568949604,1 min=1689916873.436580640,0 seq=1000} lock=true stat=PENDING rts=1689916873.436580640,0 wto=false gul=1689916873.936580640,0 The `RETURNING` clauses on these two `UPDATE` statements prevent automatic transaction retries. Here we wrap the queries in an explicit transaction with a retry loop which should prevent the test failure test, assuming that the update isn't so contended it won't ever complete. I am unable to stress this enough locally to reproduce this error. Probably Fixes cockroachdb#107330 Epic: none Release note: None

…09517 release-23.1.9-rc: sql: avoid stampede on synthetic privilege cache in vtable population

…ort-release-23.1.9-rc-109713 release-23.1.9-rc: backupccl: avoid creating spans that start or end on a .Next() key

Restore may use unsafe keys as split points, which may cause unsafe splits between column families, which may cause SQL to fail when reading the row, or worse, return wrong resutls. This commit avoids splitting on keys that might be unsafe. See the issue for more info. Epic: none Informs: cockroachdb#109483 Release note: None.

…ort-release-23.1.9-rc-109378 release-23.1.9-rc: backupccl: avoid splitting if the split point might be unsafe

…int might be unsafe"

…blathers/backport-release-23.1.9-rc-109378 Revert "release-23.1.9-rc: backupccl: avoid splitting if the split point might be unsafe"

Node 16 no longer is supported, so installation of it fails. Now we update to a supported version. Release note: None

This is required to make a test pass. Release note: None

…ort-release-23.1.9-rc-109896 release-23.1.9-rc: roachtest: update version of nodejs used by ORM tests

…ort-release-23.1.9-rc-109750 release-23.1.9-rc: backupccl: during restore, do not .Next() any keys in makeSimpleImportSpans

blathers-crl · 2023-09-08T12:57:03Z

cockroach-teamcity · 2023-09-08T12:57:11Z

This change is

celiala

thank you!

nvanbenschoten and others added 30 commits July 27, 2023 18:05

Merge pull request cockroachdb#108641 from ericharmeling/backport23.1…

d417968

….9-rc-107199 release-23.1.9-rc: cli: simplify output assertion in TestUnavailableZip

Merge pull request cockroachdb#108713 from koorosh/backport23.1.9-rc-…

8264699

…108634 release-23.1.9-rc: sql: fix TestSQLStatsRegions test

Merge pull request cockroachdb#108733 from zachlite/230814.backport-t…

4eaa671

…onumber-fix ui: fix occurrence of x.toNumber is not a function

Merge pull request cockroachdb#108750 from cockroachdb/blathers/backp…

3d6afb8

…ort-release-23.1.9-rc-108703

Merge pull request cockroachdb#108816 from kvoli/backport23.1.9-rc-10…

bdce8e2

…8725

Merge pull request cockroachdb#108821 from yuzefovich/backport23.1.9-…

f5421eb

…rc-105931 release-23.1.9-rc: colexec: don't infinite loop cross-join with zero-column left input

Merge pull request cockroachdb#108855 from michae2/backport23.1.9-rc-…

8970df1

…108626 release-23.1.9-rc: importer: only check import *atomicity* in TestImportWorkerFailure

Merge pull request cockroachdb#108871 from jayshrivastava/release-23.…

0392495

…1.9-rc-backport-kafka-oauth release 23.1.9 rc: roachtest: Ensure tpcc workloads runs for a bit

roachtest: add env vars to chaos test

8aa022b

This change adds option env vars which can be passed to `tests.Chaos` that are applied to nodes that are restarted. Release note: None Epic: None

Merge pull request cockroachdb#108913 from xinhaoz/backport23.1.9-rc-…

4a78b6e

…108835 release-23.1.9-rc: persistedsqlstats: add cluster setting to enable flush on row limit r…

roachtest: ignore upstream flakes in activerecord and sequelize

cbe89d2

Both of these tests are being worked on upstream to resolve the flakiness. Since CockroachDB itself is not the cause of the flakes, we ignore them so that these tests don't create unnecessary issues. Release note: None

Merge pull request cockroachdb#108970 from cockroachdb/blathers/backp…

0c3fb6b

…ort-release-23.1.9-rc-108939 release-23.1.9-rc: roachtest: ignore upstream flakes in activerecord and sequelize

rafiss and others added 11 commits August 31, 2023 10:57

Merge pull request cockroachdb#109736 from rafiss/backport23.1.9-rc-1…

cda9475

…09517 release-23.1.9-rc: sql: avoid stampede on synthetic privilege cache in vtable population

Merge pull request cockroachdb#109778 from cockroachdb/blathers/backp…

5265ddb

…ort-release-23.1.9-rc-109713 release-23.1.9-rc: backupccl: avoid creating spans that start or end on a .Next() key

Merge pull request cockroachdb#109821 from cockroachdb/blathers/backp…

202ec76

…ort-release-23.1.9-rc-109378 release-23.1.9-rc: backupccl: avoid splitting if the split point might be unsafe

Revert "release-23.1.9-rc: backupccl: avoid splitting if the split po…

a2df163

…int might be unsafe"

Merge pull request cockroachdb#109860 from cockroachdb/revert-109821-…

e1ae3c6

…blathers/backport-release-23.1.9-rc-109378 Revert "release-23.1.9-rc: backupccl: avoid splitting if the split point might be unsafe"

roachtest: update version of nodejs used by ORM tests

c4b1fd9

Node 16 no longer is supported, so installation of it fails. Now we update to a supported version. Release note: None

roachtest: update sqlalchemy version under test

0a28d50

This is required to make a test pass. Release note: None

Merge pull request cockroachdb#109928 from cockroachdb/blathers/backp…

235b47f

…ort-release-23.1.9-rc-109896 release-23.1.9-rc: roachtest: update version of nodejs used by ORM tests

Merge pull request cockroachdb#109939 from cockroachdb/blathers/backp…

28b2e76

…ort-release-23.1.9-rc-109750 release-23.1.9-rc: backupccl: during restore, do not .Next() any keys in makeSimpleImportSpans

Merge tag 'v23.1.9' into release-23.1

b6cbc30

dt requested a review from rail September 8, 2023 12:57

blathers-crl bot added the backport Label PR's that are backports to older release branches label Sep 8, 2023

dt requested a review from celiala September 8, 2023 12:57

dt removed the backport Label PR's that are backports to older release branches label Sep 8, 2023

rail approved these changes Sep 8, 2023

View reviewed changes

celiala approved these changes Sep 8, 2023

View reviewed changes

dt merged commit 4c0c1a9 into cockroachdb:release-23.1 Sep 8, 2023

dt deleted the merge-23.1.9 branch September 8, 2023 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release-23.1: merge v23.1.9 tag into release branch #110246

release-23.1: merge v23.1.9 tag into release branch #110246

Uh oh!

dt commented Sep 8, 2023 •

edited

Loading

Uh oh!

blathers-crl bot commented Sep 8, 2023

Uh oh!

cockroach-teamcity commented Sep 8, 2023

Uh oh!

celiala left a comment

Uh oh!

Uh oh!

release-23.1: merge v23.1.9 tag into release branch #110246

release-23.1: merge v23.1.9 tag into release branch #110246

Uh oh!

Conversation

dt commented Sep 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Sep 8, 2023

Uh oh!

cockroach-teamcity commented Sep 8, 2023

Uh oh!

celiala left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dt commented Sep 8, 2023 •

edited

Loading