Remove filters from LookupJoins when they're provably not required. #1888

nicktobey · 2023-07-21T22:53:08Z

During joins, we still evaluate every filter for every candidate row. But based on the join implementation, some of those filters must necessarily be true, so we don't need to evaluate them.

In most joins the performance cost of this isn't that bad, but this problem is most noticeable in LookupJoins where a point lookup is constructed from several columns, at which point the filter evaluation can dominate the runtime.

One potential drawback of this change is that this might make reading query plans more obtuse, because some filters are no longer explicitly listed in the plan, but are a consequence of Join node's children. One option to prevent this would be to add metadata to joins listing filters that they assume to hold, or marking filters in the execution plan as "skipped." However, I think that described execution plans should match the actual behavior of the execution as closely as possible, so this may not actually be a concern.

Some notes about the individual commits in this PR:

3b54dbe: This commit fixes an existing bug which caused lookups to return extra rows if the filter was a null-safe-equals check, and the key is non-null. This bug previously caused no issues because these extra rows would not match the filter and would get dropped. Now that we're skipping filters we know to be extraneous, this bug would manifest if not fixed.

48e8777: This commit changes the costing function of lookup joins. In the event that the lookup expressions can't be proved to uniquely identify a row, we attempt to estimate what percentage of the rows will be returned by the lookup. It's a very rough estimate, and serves mostly to guide which index to use if the table has multiple indexes: the more filters the index is able to make redundant, the better. The previous costing implementation had a special case for indexes where every column in the index was filtered on, and assigned that index a score somewhere between a 3-key lookup and a 4-key lookup. I imagine that the thought process when this was implemented was a lookup that used every column in an index would tend to result in very few rows compared to a lookup using an index prefix. Of course this is data dependent and I'm not convinced is generally true.

2b384a3: This commit updates tests that check for specific query plans. Most of these updates are just eliminating filters. A small number of updates are a result of changing the lookup costing, resulting in different join types. I haven't looked at these closely yet. This may be okay, it may not be.

…returns all non-null rows.

…ll always match.

… in the lookup was a strong candidate, but that's not necessarily true.

max-hoffman · 2023-07-21T23:04:33Z

#benchmark

nicktobey · 2023-07-21T23:52:49Z

I created #1889 to benchmark just the costing change on its own.

jycor · 2023-07-24T19:13:31Z

sql/analyzer/indexed_joins.go

+				found := false
+				for _, matchedFilter := range matchedFilters {
+					if filter == matchedFilter {
+						found = true


I think you can just do

filters = append(filters, filter) break

and get rid of the found flag

Alternatively, could make matchedFilters a map, and avoid a nested for loop. Doubtful this will really make a difference in runtime.

The purpose behind the loop is to compute the set difference: we want to fill filters with every filter from rel.Filter that isn't also in matchedFilters. Obnoxiously there's no way to do this in the standard library that I could find.

Something with a map would be asymptotically faster, but a nested loop is likely faster for all real uses cases. (You'd need a lot of filters in your join before the overhead of maintaining a map becomes worth it.)

nicktobey · 2023-07-25T17:45:01Z

Turns out pretty much all of the performance wins came from #1889. Removing the filters from joins is nice for simplifying plans, but impacts performance only negligibly, and this is potentially a risky change.

nicktobey · 2023-07-25T18:43:19Z

So it looks like the described plans include the index being used for the lookup, but not what the key expressions on the index are. This is important information that is needed to fully understand the plan, and if we remove the filters there's no way to infer that information.

Adding that information wouldn't be too hard, but it's not trivial either and getting this in isn't a priority at the moment.

…bey/remove-filters

max-hoffman

lgtm, one small comment

max-hoffman · 2023-09-11T19:57:31Z

sql/analyzer/indexed_joins.go

+			for _, filter := range rel.Filter {
+				found := false
+				for _, matchedFilter := range matchedFilters {
+					if filter == matchedFilter {


you might be able to just match with e.ExprId()

Good catch! That's not only more efficient, but that allows us to detect matches through table aliases.

Switching to use e.ExprId() appears to have broken some correctness tests. I'm going to merge the previous version.

This reverts commit 52fdad0.

This reverts commit 6fa77f6.

max-hoffman · 2023-10-13T21:06:58Z

@nicktobey I'm inclined to close this unless you feel strongly

timsehn · 2023-10-17T21:36:09Z

I have a three month rule. You have until Oct 21.

…ove redundant filters.

nicktobey added 4 commits July 21, 2023 12:15

Fix bug: A lookup with a non-null key for a NullSafeEq inadvertantly …

3b54dbe

…returns all non-null rows.

Remove filters from LookupJoin if we can prove that candidate rows wi…

50ce792

…ll always match.

Recost LookUp joins: We assumed that an index which used every column…

48e8777

… in the lookup was a strong candidate, but that's not necessarily true.

Update Query Plan Tests.

2b384a3

nicktobey requested review from jycor and max-hoffman July 21, 2023 22:53

nicktobey mentioned this pull request Jul 21, 2023

Recost LookupJoins to remove special casing for indexes where every column is used in a filter. #1889

Merged

jycor reviewed Jul 24, 2023

View reviewed changes

nicktobey added 3 commits September 11, 2023 11:08

Merge branch 'main' of github.com:dolthub/go-mysql-server into nickto…

55b6b05

…bey/remove-filters

Update query plans after merge.

d4fba66

Add list of key expressions to IndexedTableAccess's debug string.

e26df6c

max-hoffman approved these changes Sep 11, 2023

View reviewed changes

nicktobey added 6 commits September 11, 2023 13:27

Remove noisy do-nothing filters from described plans.

c0030d7

Match scalar expr ids instead of scalars.

6fa77f6

Match scalar expr ids instead of scalars.

52fdad0

Revert "Match scalar expr ids instead of scalars."

f7f0426

This reverts commit 52fdad0.

Revert "Match scalar expr ids instead of scalars."

8063b37

This reverts commit 6fa77f6.

Merge branch 'main' into nicktobey/remove-filters

1c849f6

nicktobey force-pushed the nicktobey/remove-filters branch from f937703 to 1c849f6 Compare September 29, 2023 16:56

nicktobey added 2 commits October 17, 2023 16:54

Merge remote-tracking branch 'origin' into nicktobey/remove-filters

15654ac

Update query plans to include keys for indexed table accesses and rem…

d2a5654

…ove redundant filters.

nicktobey merged commit 8c0e16c into main Oct 18, 2023

nicktobey deleted the nicktobey/remove-filters branch October 18, 2023 00:51

BrewTestBot mentioned this pull request Oct 19, 2023

dolt 1.21.0 Homebrew/homebrew-core#151779

Merged

Uh oh!

Uh oh!

Remove filters from LookupJoins when they're provably not required. #1888

Remove filters from LookupJoins when they're provably not required. #1888

Uh oh!

Conversation

nicktobey commented Jul 21, 2023

Uh oh!

max-hoffman commented Jul 21, 2023

Uh oh!

nicktobey commented Jul 21, 2023

Uh oh!

jycor Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

jycor Jul 24, 2023

Choose a reason for hiding this comment

Uh oh!

nicktobey Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktobey commented Jul 25, 2023

Uh oh!

nicktobey commented Jul 25, 2023

Uh oh!

max-hoffman left a comment

Choose a reason for hiding this comment

Uh oh!

max-hoffman Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

nicktobey Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

nicktobey Sep 20, 2023

Choose a reason for hiding this comment

Uh oh!

max-hoffman commented Oct 13, 2023

Uh oh!

timsehn commented Oct 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nicktobey Jul 25, 2023 •

edited

Loading