Skip to content

sql: add information about retries to statement trace #67941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Jul 22, 2021

Currently, statement diagnostic bundles reflect the last,
successful run of the query. While we already track the number
of times a transaciton has been automatically retried, we do
not record errors causing the retry. This addition allows
us to access the last error causing an automatic retry
during statement execution, adding it to the statement
trace and diagnostic bundle.

Resolves: #65146

Release note (sql change): Retry information has been added
to the statement trace under the 'exec stmt' operation.

The trace message is in the format:
"executing after retries, last retry reason: "

This message will appear in any operations that show the
statement trace, which is included in operations such as
SHOW TRACE FOR SESSION and is also exported in the statement
diagnostics bundle.


Sample message in the trace after forcing a retry:

"sql/conn_executor_exec.go:690 [n1,client=127.0.0.1:54187,hostnossl,user=root] executing after 4 retries, last retry reason: crdb_internal.force_retry(): TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()"

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz force-pushed the diag-bundle-retries branch 3 times, most recently from 8232314 to ff23009 Compare July 23, 2021 15:57
@xinhaoz xinhaoz marked this pull request as ready for review July 23, 2021 17:12
@xinhaoz xinhaoz requested review from Azhng, a team, matthewtodd and THardy98 July 23, 2021 17:12
Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on this!

A few things to add to your PR/commit message:

  • the issue this is related to with Resolves: #issuenumber
  • an example of a result, showing the output of your changes
  • you're making a change that the user will notice, so it should be a release note for it, you can add a message mentioning which new information in being added and where

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @matthewtodd, and @THardy98)

@xinhaoz xinhaoz force-pushed the diag-bundle-retries branch 2 times, most recently from 4e7167f to f70692e Compare July 26, 2021 17:02
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd, @THardy98, and @xinhaoz)


pkg/sql/txn_restart_test.go, line 1598 at r2 (raw file):

	params.Knobs.SQLExecutor = &sql.ExecutorTestingKnobs{
		BeforeRestart: func(ctx context.Context, reason error) {
			if !testutils.IsError(reason, fmt.Sprintf("injected err %d", retryCount)) {

nit: IIUC, if somehow the txn is not restarted at all for some reason, this test will pass. Can we add a check to ensure that the callback is actually executed?


pkg/sql/txn_restart_test.go, line 1622 at r2 (raw file):

	defer cleanupFilter()

	sqlDB.SetMaxOpenConns(1)

Why do we need to set max open connection here?


pkg/sql/txn_restart_test.go, line 1624 at r2 (raw file):

	sqlDB.SetMaxOpenConns(1)

	if _, err := sqlDB.Exec(`

nit: you can wrap sqlDB into a sqlutils.MakeSQLRunner so it will fail the test if there's any error encountered.

@xinhaoz xinhaoz force-pushed the diag-bundle-retries branch 2 times, most recently from aba75a2 to 9bb6b3d Compare July 27, 2021 14:37
@xinhaoz
Copy link
Member Author

xinhaoz commented Jul 27, 2021

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd, @THardy98, and @xinhaoz)

pkg/sql/txn_restart_test.go, line 1598 at r2 (raw file):

	params.Knobs.SQLExecutor = &sql.ExecutorTestingKnobs{
		BeforeRestart: func(ctx context.Context, reason error) {
			if !testutils.IsError(reason, fmt.Sprintf("injected err %d", retryCount)) {

nit: IIUC, if somehow the txn is not restarted at all for some reason, this test will pass. Can we add a check to ensure that the callback is actually executed?

pkg/sql/txn_restart_test.go, line 1622 at r2 (raw file):

	defer cleanupFilter()

	sqlDB.SetMaxOpenConns(1)

Why do we need to set max open connection here?

pkg/sql/txn_restart_test.go, line 1624 at r2 (raw file):

	sqlDB.SetMaxOpenConns(1)

	if _, err := sqlDB.Exec(`

nit: you can wrap sqlDB into a sqlutils.MakeSQLRunner so it will fail the test if there's any error encountered.

Added restarted check and SQL runner. The SetMaxOpenConns snuck in on accident, removed it as it shouldn't be necessary for this test.

@xinhaoz xinhaoz requested review from Azhng and maryliag July 27, 2021 14:37
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The code looks good.

@yuzefovich do you mind take a look as well?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, @matthewtodd, @THardy98, and @xinhaoz)


pkg/sql/conn_executor.go, line 631 at r3 (raw file):

}

// GetLocalIndexStatistics returns a idxusage.LocalIndexUsageStats.

This seems to be included by accident ?


pkg/sql/txn_restart_test.go, line 1631 at r3 (raw file):

`)

	if retryCount != numRetries {

super nit: you can use require.Equal() method here to reduce some verbosity

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the code looks good to me too.

Reviewed 1 of 1 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng, @maryliag, @matthewtodd, @THardy98, and @xinhaoz)


pkg/sql/conn_executor.go, line 1210 at r3 (raw file):

		// the current transaction is being automatically retried. This is used in
		// statement traces to give more information in statement diagnostic bundles.
		autoRetryReason error

super nit: might be nicer to put this right below autoRetryCounter.

Currently, statement diagnostic bundles reflect the last,
successful run of the query. While we already track the number
of times a transaciton has been automatically retried, we do
not record errors causing the retry. This addition allows
us to access the last error causing an automatic retry
during statement execution, adding it to the statement
trace and diagnostic bundle.

Resolves: cockroachdb#65146

Release note (sql change): Retry information has been added
to the statement trace under the 'exec stmt' operation.

The trace message is in the format:
"executing after <int> retries, last retry reason: <err>"

This message will appear in any operations that show the
statement trace, which is included in operations such as
SHOW TRACE FOR SESSION and is also exported in the statement
diagnostics bundle.
@xinhaoz xinhaoz force-pushed the diag-bundle-retries branch from 9bb6b3d to 3f03ba8 Compare July 27, 2021 18:23
@xinhaoz xinhaoz requested a review from Azhng July 27, 2021 18:26
Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: 🔥 🔥 🔥

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @maryliag, @matthewtodd, @THardy98, @xinhaoz, and @yuzefovich)

@xinhaoz
Copy link
Member Author

xinhaoz commented Jul 27, 2021

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build succeeded:

@craig craig bot merged commit d788c60 into cockroachdb:master Jul 27, 2021
@xinhaoz xinhaoz deleted the diag-bundle-retries branch July 28, 2021 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: add information about retries in statement bundles
5 participants