Skip to content

sql: only skip import worker failure under race,stress,deadlock #105712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 28, 2023

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Jun 28, 2023

Better to have the test running some of the time to catch regressions.

Informs: #102839
Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach changed the title sql: skip import worker failure under race,stress,deadlock sql: only skip import worker failure under race,stress,deadlock Jun 28, 2023
Better to have the test running some of the time to catch regressions.

Informs: 102839
Epic: None
Release note: None
@cucaroach cucaroach marked this pull request as ready for review June 28, 2023 13:45
@cucaroach cucaroach requested a review from a team as a code owner June 28, 2023 13:45
@cucaroach cucaroach requested a review from msirek June 28, 2023 13:45
@cucaroach
Copy link
Contributor Author

bors r+
TFTR!

@craig
Copy link
Contributor

craig bot commented Jun 28, 2023

Build succeeded:

@craig craig bot merged commit 3c54e12 into cockroachdb:master Jun 28, 2023
@cucaroach
Copy link
Contributor Author

blathers backport 23.1

@blathers-crl
Copy link

blathers-crl bot commented Jul 17, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 1f66e36 to blathers/backport-release-23.1-105712: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

michae2 added a commit to michae2/cockroach that referenced this pull request Aug 11, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

This PR also completely unskips `TestImportWorkerFailure` so that we can
test the atomicity of imports more thoroughly.

Fixes: cockroachdb#102839

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Aug 14, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

This PR also unskips `TestImportWorkerFailure` under stress so that we
can test the atomicity of imports more thoroughly.

Fixes: cockroachdb#102839

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Aug 14, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: cockroachdb#102839

Release note: None
craig bot pushed a commit that referenced this pull request Aug 14, 2023
108210: cli: add limit statment_statistics to debug zip r=j82w a=j82w

This adds statement_statistics to the debug zip. It is limited to the transaction fingerprint ids in the the transaction_contention_events table. This is because the statement_statistics table maps the fingerprint to the query text. It also adds the top 100 statements by cpu usage.

closes: #108180

Release note (cli change): Added limited statement_statistics to the debug zip.

108382: ccl/sqlproxyccl: serve a dirty cache whenever the watcher fails r=JeffSwenson a=jaylim-crl

Previously, we will invalidate all tenant metadata entries whenever the
watcher fails. This can cause issues when the directory server fails
(e.g. Kubernetes API server is down). It is possible that existing SQL pods
are still up, but we're invalidating the entire directory cache. We should
allow incoming requests with existing SQL pods to connect to those pods.

This commit addresses the issue by serving a stale cache whenever the watcher
fails and not invalidating the cache.

Release note: None

Epic: CC-25053

108626: importer: only check import *atomicity* in TestImportWorkerFailure r=dt,yuzefovich,cucaroach a=michae2

Five years ago, in #26881, we changed import to retry on worker failures, which made imports much more resilient to transient failures like nodes going down. As part of this work we created `TestImportWorkerFailure` which shuts down one node during an import, and checks that the import succeeded. Unfortunately, this test was checked-in skipped, because though imports were much more resilient to node failures, they were not completely resilient in every possible scenario, making the test flakey.

Two months ago, in #105712, we unskipped this test and discovered that in some cases the import statement succeeded but only imported a partial dataset. This non-atomicity seems like a bigger issue than whether the import is able to succeed in every possible transient failure scenario, and is tracked separately in #108547.

This PR changes `TestImportWorkerFailure` to remove successful import as a necessary condition for test success. Instead, the test now only checks whether the import was atomic; that is, whether a successful import imported all data or a failed import imported none. This is more in line with what we can guarantee about imports today.

Fixes: #102839

Release note: None

Co-authored-by: j82w <jwilley@cockroachlabs.com>
Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
michae2 added a commit to michae2/cockroach that referenced this pull request Aug 16, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: cockroachdb#102839

Release note: None
michae2 added a commit that referenced this pull request Aug 16, 2023
Five years ago, in #26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in #105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in #108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: #102839

Release note: None
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this pull request Nov 8, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: cockroachdb#102839

Release note: None
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this pull request Nov 8, 2023
Five years ago, in cockroachdb#26881, we changed import to retry on worker
failures, which made imports much more resilient to transient failures
like nodes going down. As part of this work we created
`TestImportWorkerFailure` which shuts down one node during an import,
and checks that the import succeeded. Unfortunately, this test was
checked-in skipped, because though imports were much more resilient to
node failures, they were not completely resilient in every possible
scenario, making the test flakey.

Two months ago, in cockroachdb#105712, we unskipped this test and discovered that
in some cases the import statement succeeded but only imported a partial
dataset. This non-atomicity seems like a bigger issue than whether the
import is able to succeed in every possible transient failure scenario,
and is tracked separately in cockroachdb#108547.

This PR changes `TestImportWorkerFailure` to remove successful import as
a necessary condition for test success. Instead, the test now only
checks whether the import was atomic; that is, whether a successful
import imported all data or a failed import imported none. This is more
in line with what we can guarantee about imports today.

Fixes: cockroachdb#102839

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants