-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure that the data types are supported in hashjoin before genera… #2702
Conversation
…ting hashjoin logical plan. If data types are not supported in hashjoin, try cross join with filters.
Codecov Report
@@ Coverage Diff @@
## master #2702 +/- ##
==========================================
- Coverage 84.66% 84.64% -0.02%
==========================================
Files 270 270
Lines 46919 47037 +118
==========================================
+ Hits 39724 39815 +91
- Misses 7195 7222 +27
Continue to review full report at Codecov.
|
Some(filter_expr) => { | ||
filters = Some(Expr::BinaryExpr { | ||
left: Box::new(expr), | ||
op: Operator::And, | ||
right: Box::new(filter_expr), | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(filter_expr) => { | |
filters = Some(Expr::BinaryExpr { | |
left: Box::new(expr), | |
op: Operator::And, | |
right: Box::new(filter_expr), | |
}); | |
} | |
Some(filter_expr) => filters = Some(and(expr, filter_expr)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll fix this.
let expr = Expr::BinaryExpr { | ||
left: Box::new(Expr::Column(l.clone())), | ||
op: Operator::Eq, | ||
right: Box::new(Expr::Column(r.clone())), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let expr = Expr::BinaryExpr { | |
left: Box::new(Expr::Column(l.clone())), | |
op: Operator::Eq, | |
right: Box::new(Expr::Column(r.clone())), | |
}; | |
let expr = binary_expr( | |
Expr::Column(l.clone()), | |
Operator::Eq, | |
Expr::Column(r.clone()), | |
); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll fix this.
@@ -19,6 +19,7 @@ | |||
|
|||
use crate::expr_rewriter::{normalize_col, normalize_cols, rewrite_sort_cols_by_aggs}; | |||
use crate::utils::{columnize_expr, exprlist_to_fields, from_plan}; | |||
use crate::Operator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions below that require these additional imports
use crate::Operator; | |
use crate::{and, binary_expr, Operator}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll fix this.
/// If more data types are supported in hash join, add those data types here | ||
/// to generate join logical plan. | ||
pub fn can_hash(data_type: &DataType) -> bool { | ||
match data_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal should be supported here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data types here come from function equal_rows, decimal is not supported in equal_rows so that hash join currently does not support joining on columns of decimal data type. That's why decimal is not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may help to add a comment here (or in equal_rows
) mentioning they need to remain in sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may help to add a comment here (or in
equal_rows
) mentioning they need to remain in sync
I'll add this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @AssHero
I believe this PR does what the description says -- and it adds nice test coverage so I am approving it.
However, I am concerned that trying to run a query using a CrossJoin
is likely to be catastrophically bad so this approach will not work for anything except very small inputs.
// join on hash unsupported data type (Date32), use cross join instead hash join | ||
let sql = "select * from foo t1 join foo t2 on t1.c4 = t2.c4"; | ||
let msg = format!("Creating logical plan for '{}'", sql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think CrossJoin is almost never what the user would want: as once the tables get beyond any trivial size the query will effectively never finish or will run out of memory. An error is clearer.
From the issue description #2145 (comment) I think @pjmore's idea to cast unsupported types to a supported type is a good one -- the arrow cast
kernels are quite efficient for things like Date32
-> Int32
(no copies) as the representations are the same
@pjmore what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think CrossJoin is almost never what the user would want: as once the tables get beyond any trivial size the query will effectively never finish or will run out of memory. An error is clearer.
From the issue description #2145 (comment) I think @pjmore's idea to cast unsupported types to a supported type is a good one -- the arrow
cast
kernels are quite efficient for things likeDate32
->Int32
(no copies) as the representations are the same@pjmore what do you think?
I agree. supporting more data types in hash join is the better way to solve this issue, and i'm already working on it.
And this pr only wants to make hash unsupported join running in cross join instead of error/panic, we can support more data types in hash join continuously.
/// If more data types are supported in hash join, add those data types here | ||
/// to generate join logical plan. | ||
pub fn can_hash(data_type: &DataType) -> bool { | ||
match data_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may help to add a comment here (or in equal_rows
) mentioning they need to remain in sync
Thank you again @AssHero |
Which issue does this PR close?
Closes #2145
Rationale for this change
Before generating the hash join logical plan, make sure the data types in equal conditions are supported. Otherwise, try
cross join instead.
What changes are included in this PR?