Skip to content

Conversation

@andy-kimball
Copy link
Contributor

This PR contains a number of improvements and fixes to the functional
dependency library:

  • Normalize constants and equivalencies so there's less redundancy
  • Allow MakeOuter dependencies that cross join boundaries
  • Make ProjectCols actually remove un-projected columns
  • Fix FDs generated for apply joins
  • Add Verify method to FuncDepSet

Together, these changes clean up the FD logical props so that they're
more compact and more efficient.

@andy-kimball andy-kimball requested a review from a team as a code owner June 28, 2018 07:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member

Some minor comments for r1 (posting separately so it's easier to keep track of which commit to update).


Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/props/func_dep.go, line 737 at r1 (raw file):

		f.deps = deps
	} else {
		f.deps[0].to = f.deps[0].to.Union(cols)

I wondered if we should also set cols to the union to make the logic below do more. I don't think that does anything because we would have already performed simplifications with existing constant columns, but it's a bit subtle. Not sure if it warrants a comment or not though.


pkg/sql/opt/props/func_dep.go, line 748 at r1 (raw file):

		if !fd.equiv {
			if fd.strict {
				// Constants in the from side of a strict dependency are no-ops,

[nit] This is a bit misleading. There are two cases:

  • if the from side includes constant columns and non-constant columns, the constant columns are no-ops and can be removed;
  • if the from side includes only constant columns, the to side is also constant and would now be represented in the first FD; so this FD is redundant. (maybe we could assert that to is included in f.deps[0].to)

pkg/sql/opt/props/func_dep_test.go, line 472 at r1 (raw file):

	verifyFD(t, &bcden, "")

	// Project single row.

single column


pkg/sql/opt/props/func_dep_test.go, line 473 at r1 (raw file):

	// Project single row.
	//   SELECT d FROM abcde, mnpq WHERE a=m AND 1=1 AND n=2

a=1?


Comments from Reviewable

@RaduBerinde
Copy link
Member

r2 LGTM


Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/props/func_dep.go, line 721 at r2 (raw file):

			// All determinant columns are equivalent to one another.
			if fd.equiv {
				// Ensure that each equivalent columns directly maps to all other

[nit] columns


Comments from Reviewable

@RaduBerinde
Copy link
Member

r3,r4 LGTM
t5 LGTM (with minor comments)


Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/props/func_dep.go, line 986 at r5 (raw file):

			//   ----------------------
			//   1  4  1     NULL  2
			//   2  5  NULL  NULL  NULL

Could you add a 2 4 2 x y line as an example of when (a)-->(c) doesn't hold?


pkg/sql/opt/props/func_dep_test.go, line 597 at r5 (raw file):

	//   SELECT * FROM abcde RIGHT OUTER JOIN mnpq ON a=m
	nullExtendedCols = util.MakeFastIntSet(1, 2, 3, 4, 5)
	loj = makeAbcdeFD(t)

[nit] should be roj (same for the two cases above and the one below)


Comments from Reviewable

@RaduBerinde
Copy link
Member

question on r6


Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/props/func_dep.go, line 810 at r6 (raw file):

				fd.to = f.ComputeClosure(fd.to)
			}
		}

Can't we also replace any removed columns in from with equivalent columns that are not removed (if possible)?


Comments from Reviewable

@RaduBerinde
Copy link
Member

r7 LGTM
question on r8
r9 LGTM


Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/norm/testdata/rules/decorrelate, line 643 at r8 (raw file):

 │    │    │    ├── columns: x:6(int!null) y:7(int) v:9(int) notnull:12(bool)
 │    │    │    ├── outer: (2)
 │    │    │    ├── fd: (6)-->(7,12), ()~~>(12)

Why is notnull a constant column here? And how come we have a const column in a to, I thought we always simplify that now?


pkg/sql/opt/props/func_dep.go, line 1071 at r9 (raw file):

//   7. Closure of key should include all known columns in the FD set.
//   8. If FD set has no key then key columns should be empty.
//

Should we check that there are no const cols in any from or to?


Comments from Reviewable

@andy-kimball
Copy link
Contributor Author

Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/norm/testdata/rules/decorrelate, line 643 at r8 (raw file):

Previously, RaduBerinde wrote…

Why is notnull a constant column here? And how come we have a const column in a to, I thought we always simplify that now?

We always simplify constants when directly adding them. But there are transitive cases where a previously lax dependency becomes a strict dependency where we might end up with transitive constants that aren't rolled into the constant FD. That's OK, as it would be expensive to always keep that up-to-date. I usually wait until ProjectCols is called to update the constant FD dependants. Before that point it might not contain the full constant closure.

It's a known NP-Complete problem to find the "minimal cover" of a given FD set, so I make do with some reasonable simplification heuristics instead. Keeping the constant set always perfectly up-to-date would require us to recompute its closure on almost every FD set operation. So far I haven't seen enough justification for doing that. On the other hand, keeping equivalence dependency closures up-to-date is pretty easy, since only AddEquivalence and ProjectCols have an effect on their closures. So I do it in that case, but only make a best-effort in the constant case.

All that said, you've found a bug in the FD rules. In the general case, it's not safe to do that (in this special case it's correct, b/c (6) is a key of the row-supplying side). I went ahead and removed the code that is adding a constant to other FD's when we make make that constant lax as part of MakeOuter. I added this rule as an afterthought, and did not take proper care in proving its correctness to myself.


pkg/sql/opt/props/func_dep.go, line 737 at r1 (raw file):

Previously, RaduBerinde wrote…

I wondered if we should also set cols to the union to make the logic below do more. I don't think that does anything because we would have already performed simplifications with existing constant columns, but it's a bit subtle. Not sure if it warrants a comment or not though.

Actually, this line can just be simplified to f.deps[0].to = cols because we've already computed the closure above, which would include the to columns. I think the Union is a holdover from earlier versions. I've removed it.


pkg/sql/opt/props/func_dep.go, line 748 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] This is a bit misleading. There are two cases:

  • if the from side includes constant columns and non-constant columns, the constant columns are no-ops and can be removed;
  • if the from side includes only constant columns, the to side is also constant and would now be represented in the first FD; so this FD is redundant. (maybe we could assert that to is included in f.deps[0].to)

I've updated the comment to be more clear.


pkg/sql/opt/props/func_dep.go, line 721 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] columns

Done.


pkg/sql/opt/props/func_dep.go, line 986 at r5 (raw file):

Previously, RaduBerinde wrote…

Could you add a 2 4 2 x y line as an example of when (a)-->(c) doesn't hold?

I updated the example - for some reason I thought this case was covered, but it's not.


pkg/sql/opt/props/func_dep.go, line 810 at r6 (raw file):

Previously, RaduBerinde wrote…

Can't we also replace any removed columns in from with equivalent columns that are not removed (if possible)?

Yes, good idea. I added a commit to the end of this PR to do that.


pkg/sql/opt/props/func_dep.go, line 1071 at r9 (raw file):

Previously, RaduBerinde wrote…

Should we check that there are no const cols in any from or to?

For reasons I explain in another comment, it's only best effort to maintain the complete closure of the constant FD (and to then remove constants from other locations in the FD). If we see important benefits to doing it, we can reconsider.


pkg/sql/opt/props/func_dep_test.go, line 472 at r1 (raw file):

Previously, RaduBerinde wrote…

single column

Done.


pkg/sql/opt/props/func_dep_test.go, line 473 at r1 (raw file):

Previously, RaduBerinde wrote…

a=1?

Done.


pkg/sql/opt/props/func_dep_test.go, line 597 at r5 (raw file):

Previously, RaduBerinde wrote…

[nit] should be roj (same for the two cases above and the one below)

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Reviewed 1 of 36 files at r10, 1 of 3 files at r16, 31 of 31 files at r17, 1 of 3 files at r18, 2 of 2 files at r19.
Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Jun 29, 2018

:lgtm: Nice changes!


Reviewed 36 of 36 files at r10, 2 of 2 files at r11, 2 of 2 files at r12, 2 of 2 files at r13, 2 of 2 files at r14, 3 of 3 files at r15, 3 of 3 files at r16, 31 of 31 files at r17, 3 of 3 files at r18, 2 of 2 files at r19.
Review status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/props/func_dep.go, line 934 at r13 (raw file):

		fd := &fdset.deps[i]
		if fd.from.Empty() {
			f.addDependency(fd.from, fd.to, fd.strict, fd.equiv)

The comment above implies you don't need this call here. Maybe add a comment here or update the comment above.


pkg/sql/opt/props/func_dep.go, line 986 at r14 (raw file):

			//
			//   CREATE TABLE ab (a INT, b INT, PRIMARY KEY(a, b))
			//   SELECT * FROM ab LEFT OUTER JOIN cde ON a=c AND b=1

It's a bit odd that you only included the CREATE TABLE statement for ab -- why not cde?


pkg/sql/opt/props/func_dep.go, line 1037 at r14 (raw file):

				}
			} else {
				// Rule 2, described above (where determinant is on row-supplying side).

Isn't this actually rule #1?


pkg/sql/opt/props/func_dep.go, line 771 at r15 (raw file):

// by replacing any un-projected dependants by their closures, and then removing
// all FDs containing un-projected columns. While this algorithm may cause some
// loss of information in edge cases, it does a good job of preserving the most

Can you give an example of an edge case?


pkg/sql/opt/props/func_dep.go, line 828 at r15 (raw file):

	//
	if !constCols.Empty() {
		f.AddConstants(constCols)

Why is this necessary? Shouldn't constants have already been removed from all determinants by previous normalization?


pkg/sql/opt/props/func_dep.go, line 864 at r15 (raw file):

// While this requires O(N**2) time, it's useful when the two FD sets may
// overlap one another and substantial simplifications are possible (as with
// LookupJoin).

Maybe you mean IndexJoin (since these are different now)?

Also, can you add a comment to explain what the cols parameter is?


pkg/sql/opt/props/func_dep.go, line 1064 at r18 (raw file):

//   3. Lax equivalencies should be reduced to lax dependencies.
//   4. Equivalence determinant should be exactly one column.
//   5. The determinants of an equivalence is always its closure.

determinants -> dependants


pkg/sql/opt/props/func_dep.go, line 728 at r19 (raw file):

			}

			if fd.to.Intersects(cols) {

Why not just call removeToCols here? This check is redundant.


pkg/sql/opt/props/func_dep.go, line 827 at r19 (raw file):

		// Track all columns that have equivalent alternates.
		if fd.equiv {

You could add another check that fd.to.Intersects(cols) to ensure that makeEquivMap is only called when it will be useful


Comments from Reviewable

Allow equivalent columns in outer join cases like:

  SELECT * FROM ab LEFT OUTER JOIN cde ON a=c

Add rule to downgrade any dependencies that cross join boundaries,
by either dropping them entirely, or making them lax. For example,
the (c)==(a) dependency becomes just (c)~~>(a), and (a)==(c) is
dropped entirely.

Release note: None
Currently, ProjectCols sometimes marks un-projected columns as
"removed" rather than actually removing them. This commit gets rid
of the removed set and instead discards or modifies FDs that include
un-projected columns.

Release note: None
This commit fixes a bug in which incorrect constant dependencies are
sometimes generated for apply joins. For example:

  SELECT *
  FROM abcde
  INNER JOIN LATERAL (SELECT * FROM mnpq WHERE m=a LIMIT 1)
  ON True

A constant dependency for the columns of mnpq is generated, since all
columns in a relation having zero or one rows are constants. However,
since the value of the inner relation can change based on the value of
the outer relation, this constant dependency cannot be propagated to
the join's logical properties.

Release note: None
This commit updates all the unit tests to reflect the FD set changes
made in previous commits.

Release note: None
Add some verification checks to FuncDepSet and run them in
testrace builds.

Release note: None
@andy-kimball
Copy link
Contributor Author

bors r+

@andy-kimball
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 2, 2018

Canceled

In the FD ProjectCols method, before removing a column that is not
part of a projection, try to find an equivalent column that can be
substituted for it. This enables us to preserve that FD rather than
discarding it.

Release note: None
@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Jul 2, 2018
27046: opt: Improve FuncDepSet r=andy-kimball a=andy-kimball

This PR contains a number of improvements and fixes to the functional
dependency library:

- Normalize constants and equivalencies so there's less redundancy
- Allow MakeOuter dependencies that cross join boundaries
- Make ProjectCols actually remove un-projected columns
- Fix FDs generated for apply joins
- Add Verify method to FuncDepSet

Together, these changes clean up the FD logical props so that they're
more compact and more efficient.

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jul 3, 2018

Build succeeded

@craig craig bot merged commit 52dff41 into cockroachdb:master Jul 3, 2018
@andy-kimball andy-kimball deleted the normfd branch July 22, 2018 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants