Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: prevent use of leaf txn with volatile routines #113254

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DrewKimball
Copy link
Collaborator

@DrewKimball DrewKimball commented Oct 28, 2023

This patch disallows leaf transactions during execution for two cases:

  1. The plan contains a volatile routine.
  2. The plan contains a routine with a PLpgSQL exception handler.

This is necessary because volatile routines use transaction stepping, and exception handlers use internal savepoints. Both of these cases are incompatible with concurrent leaf transactions, and may cause internal errors is not handled. Note that (1) implies (2) currently, but (1) may be relaxed in the future, so we explicitly prevent both to be defensive.

Fixes #112188

Release note (bug fix): Fixed a rare bug that could result in an internal error when executing a volatile UDF in a query along with a lookup join.

@DrewKimball DrewKimball added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Oct 28, 2023
@DrewKimball DrewKimball requested a review from a team as a code owner October 28, 2023 08:40
@DrewKimball DrewKimball requested review from yuzefovich and removed request for a team October 28, 2023 08:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball DrewKimball requested review from mgartner and a team October 28, 2023 08:40
@DrewKimball
Copy link
Collaborator Author

Note that when a UDF is called like this:

SELECT f();

Concurrency already ends up being disabled because it gets planned in a values planNode, which disables the streamer.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I have some comments.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


-- commits line 14 at r1:
Do you think this could also be a fix to #111097 which we couldn't repro?


pkg/sql/distsql_running.go line 709 at r1 (raw file):

	// then we are ignorant of the details of the execution plan, so we choose
	// to be on the safe side and mark 'noMutations' as 'false'.
	noMutations := planCtx.planner != nil && !planCtx.planner.curPlan.flags.IsSet(planFlagContainsMutation)

nit: I think we can remove noMutations now because it's already included into canUseLeafTxn.


pkg/sql/distsql_running.go line 745 at r1 (raw file):

			// processors to figure out whether any of them have concurrency.
			//
			// However, the concurrency requires the usage of LeafTxns which is

nit: this paragraph of the comment should be moved to be right above us looking planFlagContainsMutation.


pkg/sql/distsql_running.go line 769 at r1 (raw file):

			containsNonDefaultLocking := planCtx.planner != nil && planCtx.planner.curPlan.flags.IsSet(planFlagContainsNonDefaultLocking)

			// We also currently disable the usage of the Streamer API whenever

nit: move this comment to be right above the just moved for loop.


pkg/sql/plan.go line 637 at r1 (raw file):

	// concurrency. Note that planFlagUseRootTxn is not the only condition that
	// forces use of the root transaction.
	planFlagMustUseRootTxn

This new flag should also be checked in NewPlanningCtxWithOracle.


pkg/sql/opt/exec/execbuilder/relational.go line 3169 at r1 (raw file):

  // Volatile routines step the transaction, and routines with an exception
  // block use internal savepoints.
	if udf.Def.Volatility == volatility.Volatile || udf.Def.ExceptionBlock != nil {

nit: the indentation seems a bit off.


pkg/sql/opt/props/func_dep.go line 667 at r1 (raw file):

// ComputeClosureNoCopy is similar to ComputeClosure, but avoids allocations
// when it is safe to mutate the given ColSet.
func (f *FuncDepSet) ComputeClosureNoCopy(cols opt.ColSet) opt.ColSet {

nit: I don't see the new function being used.


pkg/sql/logictest/testdata/logic_test/udf line 767 at r1 (raw file):

# has a volatile UDF.
query T
EXPLAIN ANALYZE

nit: we try to keep EXPLAIN ANALYZE stmts in execbuilder tests. Alternatively, consider not printing out the whole thing, but doing a filter message LIKE 'lookup join' which doesn't have ((streamer)) in it.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: once you address Yahor's comments.

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/udf line 767 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we try to keep EXPLAIN ANALYZE stmts in execbuilder tests. Alternatively, consider not printing out the whole thing, but doing a filter message LIKE 'lookup join' which doesn't have ((streamer)) in it.

+1 to moving this to an execbuilder test

@DrewKimball
Copy link
Collaborator Author

TODO: add assertion that we are using a root txn if stepping.

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test-only assertions that we are using a RootTxn when we step the txn as well as when we handle an exception.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


-- commits line 14 at r1:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do you think this could also be a fix to #111097 which we couldn't repro?

Hm, I don't think so - all the routines in that file seem to contain mutations, so I'd expect planFlagContainsMutation to be set.


pkg/sql/distsql_running.go line 709 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think we can remove noMutations now because it's already included into canUseLeafTxn.

Done.


pkg/sql/distsql_running.go line 713 at r1 (raw file):

	// canUseLeafTxn indicates whether the plan contains an expression that
	// cannot tolerate a concurrent leaf transaction.
	canUseLeafTxn := true

@yuzefovich I've moved all the "can use LeafTxn" logic up to here, and added a test-only assertion that localState.MustUseLeafTxn() is false if !canUseLeafTxn.


pkg/sql/distsql_running.go line 745 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this paragraph of the comment should be moved to be right above us looking planFlagContainsMutation.

Done.


pkg/sql/distsql_running.go line 769 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: move this comment to be right above the just moved for loop.

Done.


pkg/sql/plan.go line 637 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This new flag should also be checked in NewPlanningCtxWithOracle.

Done.


pkg/sql/opt/exec/execbuilder/relational.go line 3169 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the indentation seems a bit off.

Done.


pkg/sql/opt/props/func_dep.go line 667 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I don't see the new function being used.

Oops, removed that change.


pkg/sql/logictest/testdata/logic_test/udf line 767 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

+1 to moving this to an execbuilder test

Done.

@DrewKimball
Copy link
Collaborator Author

Hm I'm seeing internal error: MustUseLeafTxn() returned true when canUseLeafTxn is false failures in CI. I'll look into it.

This patch disallows leaf transactions during execution for two cases:
1. The plan contains a volatile routine.
2. The plan contains a routine with a PLpgSQL exception handler.

This is necessary because volatile routines use transaction stepping, and
exception handlers use internal savepoints. Both of these cases are
incompatible with concurrent leaf transactions, and may cause internal
errors is not handled. Note that (1) implies (2) currently, but (1) may
be relaxed in the future, so we explicitly prevent both to be defensive.

Fixes cockroachdb#112188

Release note (bug fix): Fixed a rare bug that could result in an internal
error when executing a volatile UDF in a query along with a lookup join.
@DrewKimball
Copy link
Collaborator Author

Hm I'm seeing internal error: MustUseLeafTxn() returned true when canUseLeafTxn is false failures in CI. I'll look into it.

It looks like this is related to things like index back-filling, which don't set IsLocal = true. I'm not sure exactly what we want to do there, but for now I've added a planner != nil check before firing that assertion.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but the build is still red.

Reviewed 12 of 13 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)


pkg/sql/distsql_running.go line 713 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

@yuzefovich I've moved all the "can use LeafTxn" logic up to here, and added a test-only assertion that localState.MustUseLeafTxn() is false if !canUseLeafTxn.

nit: we already have a related assertion at the bottom of Run method - consider moving it there.


pkg/sql/opt/exec/execbuilder/testdata/udf line 389 at r3 (raw file):

# has a volatile UDF.
query T
EXPLAIN ANALYZE

nit: since we're only interested in one line (whether we have lookup join or lookup join (streamer)), we could only count the number of rows matching the necessary filter to avoid printing out the whole thing. (Just saw that I left a similar comment last time - up to you.)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 13 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


-- commits line 11 at r3:
nit: fix the typo "... may cause internal errors is not handled".


pkg/sql/routine.go line 273 at r3 (raw file):

			if buildutil.CrdbTestBuild {
				if txn.Type() != kv.RootTxn {
					return errors.AssertionFailedf("routine: cannot step LeafTxn")

Should we return this error in non-test builds too, since it will likely fail later on in execution?


pkg/sql/opt/exec/execbuilder/relational.go line 3374 at r3 (raw file):

	// block use internal savepoints.
	if udf.Def.Volatility == volatility.Volatile || udf.Def.ExceptionBlock != nil {
		b.flags.Set(exec.PlanFlagMustUseRootTxn)

Below we always pass enableStepping=true, since procedures have no user-defined volatility, and I believe statements within a procedure can always observe side effects of previously evaluated statements. So I think we should unconditionally set this plan flag here.


pkg/sql/opt/exec/execbuilder/scalar.go line 957 at r3 (raw file):

	// Volatile routines step the transaction, and routines with an exception
	// block use internal savepoints.

nit: move this below next to enableStepping := .... Or maybe a better option is to set the flag in the if udf.Def.ExceptionBlock { ... } block below and also after enableStepping := ... in an if enableStepping { ... } block.


pkg/sql/opt/exec/execbuilder/scalar.go line 980 at r3 (raw file):

	// see mutations made by the invoking statement and by previously executed
	// statements.
	enableStepping := udf.Def.Volatility == volatility.Volatile

nit: move this below the if udf.Def.ExceptionBlock != nil { ... } block since it's not used for that code path.

@mgartner
Copy link
Collaborator

mgartner commented Mar 5, 2024

pkg/sql/opt/exec/execbuilder/scalar.go line 962 at r3 (raw file):

	}

	if udf.Def.BlockState != nil {

nit: It might be easier to see that we don't need to set the flag in this case (because initRoutineExceptionHandler is a no-op with a nil udf.Def.ExceptionBlock) if we add udf.Def.ExceptionBlock != nil to this condition (and we could make initRoutineExceptionHandler assume that the exception block is non-nil).

@mgartner
Copy link
Collaborator

mgartner commented May 2, 2024

@DrewKimball is this something we need to pick back up and get into 24.1?

@DrewKimball
Copy link
Collaborator Author

@DrewKimball is this something we need to pick back up and get into 24.1?

We should backport this, but it won't be a regression in 24.1, so not urgent.

@mgartner
Copy link
Collaborator

@DrewKimball How important do you think this is? Happy to work together to get it across the finish line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: v23.1.10: GetLeafTxnInputState() called on leaf txn in routine
4 participants