Skip to content

Conversation

@PavelSafronov
Copy link
Contributor

Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction. Good job coming up with this suggestion. It seems like a very simple change to avoid processing the filter expression and reading data from the table, and setting the schema avoids more complex (and error prone) updates to the plan nodes to remove or change projectors above the filter node. Seems like a great approach to me, but would be good to sanity check that @max-hoffman doesn't see some better approach we don't see yet.

It's probably worth adding some more test cases with some more exotic queries to make sure this approach holds up, but I know you were mostly looking for feedback on the overall direction right now.

},
{
// https://github.com/dolthub/dolt/issues/5522
Query: "select * from mytable where exists (select * from othertable where 1 = 0)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • scopes: does the panic only happen for a subquery expression? Does it also happen for two or more nested subquery expressions? What about selecting from a subquery alias with a false filter? alternating subquery -> subquery expression and vice versa can be subject to different rule orderings
  • different relational expression types: group by, window, CTE, recursive CTE, table functions, indexedTableAccess
  • what if a HAVING expression is false
  • what if we project x from othertable, or 1, or a synthetic set of columns like x+1
  • what if there are nested empty function calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR updated with more tests. Responses to your questions below.

@fulghum and @max-hoffman , thanks for looking at this! In the process of adding more tests, I even found an unrelated CTE bug: dolthub/dolt#5549 🥳

scopes: does the panic only happen for a subquery expression?

Looks like this happens only with subqueries. A false filter on a join or a table does not produce this issue:

-- false filter on top of a table: WORKS
select a from ab where false;

-- false filter on top of a join: WORKS
select a, u from (ab cross join uv) where false;

Does it also happen for two or more nested subquery expressions?

Multiple subqueries and nested subqueries also trigger this:

-- multiple subqueries: PANIC
select count(*) from ab where
exists (select * from xy where false)
or exists (select * from uv where false);

-- nested subqueries: PANIC
select count(*) from ab where
exists (select * from xy where
    exists (select * from uv where false));

What about selecting from a subquery alias with a false filter?

This seems to work without issue:

-- false filter with a subquery: WORKS
select *
from (select * from uv) a1
where a1.u = 1 AND false;

alternating subquery -> subquery expression and vice versa can be subject to different rule orderings

Do you mean testing various nested subqueries, and specifically testing the order of the subqueries?

different relational expression types: group by, window, CTE, recursive CTE, table functions, indexedTableAccess

Added cases as tests where these are the filter's relations: query, group by, window, CTE, recursive CTE, table function.
Can you give an example of indexedTableAccess?

what if a HAVING expression is false

Do you mean something like this?
select x from xy as t1 group by t1.x having exists (select * from ab where 1 = 0);
That query panics in current dolt and works with this PR.

what if we project x from othertable, or 1, or a synthetic set of columns like x+1

Can you give an example of this?

what if there are nested empty function calls

Can you give an example of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about selecting from a subquery alias with a false filter?

This is what I had in mind:

select *
from (select * from uv where false) a1
where a1.u = 1;

alternating subquery -> subquery expression and vice versa can be subject to different rule orderings

This is more what I had in mind:

select * from
(select * from xy
  where x = 1 and
    x in (select * from (select 1 where false)))

The false expression is accessed by SUBQUERY_ALIAS->SUBQUERY_EXPR->SUBQUERY_ALIAS. You might imagine similar patterns like SUBQUERY_ALIAS->SUBQUERY_ALIAS->SUBQUERY_EXPR, SUBQUERY_EXPR->SUBQUERY_ALIAS->SUBQUERY_ALIAS, ...etc.

Can you give an example of indexedTableAccess?

It might not be possible, but something like this might create a Proj(SubqueryAlias(Filter(false, IndexedTableAccess(xy, x=2)):

select * from (select * from xy where false) s where s.x = 2

select x from xy as t1 group by t1.x having exists (select * from ab where 1 = 0);

That's a good one, another one might just be

select * from xy where x in (select 1 having false)

what if we project x from othertable, or 1, or a synthetic set of columns like x+1

Adding projections can introduce different schemas than the base table

select * from xy where exists (select a, true, a*7  from ab where false)

what if there are nested empty function calls

Can't exactly think of what I meant right now. Maybe 1=(2+(1-1)) as the expression. Maybe below and variations of several scopes being empty where the layered schemas might be weird, and/or not exists and not in expressions.

select * from xy
where not exists (
  select * from ab
  where false and exists (
    select 1 where false
  )
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these test cases. Thanks Max!

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm agree with jason, more tests to make sure that the new schema doesn't have surprising effects for a variety of cases would be nice

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@PavelSafronov PavelSafronov merged commit f78c957 into main Mar 15, 2023
@Hydrocharged Hydrocharged deleted the pavel/dolt-5522 branch March 29, 2023 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dolt panics when EXISTS subquery has a false condition

3 participants