-
Notifications
You must be signed in to change notification settings - Fork 501
Nested ORDER BY and LIMIT crashes the DB #1423
Comments
Addressing some things based on my own observations, I agree with the refactor of To my knowledge, the only property initially envisoned was a Sort Property, so at the moment I don't think that any new properties would not be able to be coupled to a particular node, or pushed down otherwise. Our current rules for A bunch of property-related ideas are listed on #1421, if ok with you, I'd like to add a link to this issue in that checklist. |
@thepinetree Feel free to link this issue. I guess I don't fully understand the difference between when we create a noisepage/src/traffic_cop/traffic_cop_util.cpp Lines 36 to 63 in 520e0da
noisepage/src/optimizer/child_property_deriver.cpp Lines 92 to 103 in 520e0da
For the outer query a |
Ignoring the following method (because I don't really understand it), noisepage/src/optimizer/child_property_deriver.cpp Lines 92 to 103 in 520e0da
ORDER BY clause noisepage/src/traffic_cop/traffic_cop_util.cpp Lines 36 to 63 in 520e0da
If we want to support a |
Currently if there's an ORDER BY and a LIMIT in a nested query, then we get a segfault. The issue is that ORDER BY is converted into a PropertySort, and properties are not pushed down into nested queries. Additionally LIMIT nodes look at the sort expressions and order that is stored in the node itself instead of looking at the required properties during ChildPropertyDeriver::Visit. This creates a disconnect between what the LIMIT nodes in nested queries expects the required properties to be and what they actually are. Relying on the actual properties instead of what's stored in the LIMIT node ensures that the output properties and required properties stay in synce. The result of this change is that nested queries will ignore ORDER BYs unless it is accompanied by a LIMIT. This is due to the fact that the PropertySort will not be pushed down into nested queries, but LIMITs store their own sort information and generate an OrderBy node in the PlanGenerator. Fixes cmu-db#1423
jkosh44@fefcc60 Fixes this issue. Though it's probably still worth considering how to propagate properties down into nested queries. The change should wait until after #1422 is merged or be integrated into #1422. It's possible depending on the final state of #1422 that this change is no longer necessary (@thepinetree tagging for visibility). |
Bug Report
Summary
According to the SQL standard
ORDER BY
is not allowed in nested queries (I can't actually find the SQL Standard to confirm this, but I found a handful of sites that have said this. The best source I can find is this MariaDB post: https://mariadb.com/kb/en/why-is-order-by-in-a-from-subquery-ignored/). Different systems handle this in different ways. From some brief research, these seem to be the possible options when we encounter anORDER BY
clause in a nested query:ORDER BY
s aren't allowed in nested queriesORDER BY
ORDER BY
ORDER BY
if it's accompanied by aLIMIT
clause.ORDER BY
in a nested query would not change the result of the query unless it's accompanied by aLIMIT
.Our system will ignore an
ORDER BY
clause unless it is in the outermost query:noisepage/src/traffic_cop/traffic_cop_util.cpp
Lines 36 to 63 in 520e0da
or unless it is accompanied by a
LIMIT
clause:noisepage/src/optimizer/child_property_deriver.cpp
Lines 92 to 103 in 520e0da
When we actually encounter both an
ORDER BY
and aLIMIT
in a nested query then the database crashes. Specifically with a segfault on this line:noisepage/src/include/optimizer/group_expression.h
Line 111 in 520e0da
lowest_cost_table_
does not containrequirements
.Environment
OS: Ubuntu (LTS) 20.04
Compiler: GCC 7.0+
CMake Profile:
Debug
Jenkins/CI: N/A
Steps to Reproduce
ORDER BY
and aLIMIT
Expected Behavior
Actual Behavior
Discussion
ORDER BY Effects
I just briefly wanted to present some examples of when an
ORDER BY
could influence the result IF it was executedFor all of the following queries assume the following two tables have been created and filled with random data
Here the
ORDER BY
makes no difference. The result of the nested query is part of anIN
predicate and therefore treated as an unordered list, so it doesn't matter how it's ordered.Here the
ORDER BY
does make a difference. We only include the top two values ofa
in the right side of theIN
predicate.This is probably a stupid query, but the ORDER BY does make a difference. The result of this query should be the same as
SELECT a FROM foo ORDER BY a LIMIT 2;
, however if we ignore theORDER BY
this is not necessarily the case. I actually tested this on postgres and it seems like theORDER BY
is executed.Properties
ORDER BY
clauses are implemented with properties in the optimizer. That means that while there is anOrderByPlanNode
for physical plans (https://github.com/cmu-db/noisepage/blob/master/src/include/planner/plannodes/limit_plan_node.h), there is no LogicalOrderBy operator in the logical operator trees. Instead we have aPropertySort
property (https://github.com/cmu-db/noisepage/blob/master/src/include/optimizer/properties.h).Currently
PropertySort
is the only property in the optimizer and the implementation for deriving this property (linked above in this issue) is very specific toORDER BY
and very tightly coupled to theLogicalLimit
operator. Understanding this will probably be important to fixing this issue. Also I have the following concerns with this approach (other than this bug that it's causing):PropertySort
was so tightly coupled withLIMIT
. I think a comment or some documentation would go a long way here.ORDER BY
is. That means that the approach we use in derivingPropertySort
, where we save the information in some other node, MIGHT not be possible. I don't really know enough about properties though to know if this is true.LIMIT
into a property. This may break our current derivation ofPropertySort
.I don't know enough about the optimizer to know if these are valid concerns, but thought they warranted some discussion. They probably don't need to be resolved for this issue but are somewhat related so I thought I'd bring them up.
The text was updated successfully, but these errors were encountered: