-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Implement LeftMark join to fix subquery correctness issue #13134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e3eb975
3c695d9
5c37f4f
0b4ae53
53d32fd
e8c5a8f
00bf619
27ca940
0dbc370
34aea98
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
use std::any::Any; | ||
use std::cmp::Ordering; | ||
use std::collections::{HashMap, HashSet}; | ||
use std::iter::once; | ||
use std::sync::Arc; | ||
|
||
use crate::dml::CopyTo; | ||
|
@@ -1326,6 +1327,25 @@ pub fn change_redundant_column(fields: &Fields) -> Vec<Field> { | |
}) | ||
.collect() | ||
} | ||
|
||
fn mark_field(schema: &DFSchema) -> (Option<TableReference>, Arc<Field>) { | ||
let mut table_references = schema | ||
.iter() | ||
.filter_map(|(qualifier, _)| qualifier) | ||
.collect::<Vec<_>>(); | ||
table_references.dedup(); | ||
let table_reference = if table_references.len() == 1 { | ||
table_references.pop().cloned() | ||
} else { | ||
None | ||
}; | ||
|
||
( | ||
table_reference, | ||
Arc::new(Field::new("mark", DataType::Boolean, false)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this field name potentially conflict with user-defined column names? If so, it might be necessary to add a special prefix, similar to CSE_PREFIX. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The way the join is used from decorrelate subqueries it will never conflict as that uses a subquery alias (that is prefixed __) and the new code will then use that for the mark column as well. But if someone uses LeftMark in a query without an alias it would be able to conflict. But I don't think just adding One option could be to make the output column name be part of of the |
||
) | ||
} | ||
|
||
/// Creates a schema for a join operation. | ||
/// The fields from the left side are first | ||
pub fn build_join_schema( | ||
|
@@ -1392,6 +1412,10 @@ pub fn build_join_schema( | |
.map(|(q, f)| (q.cloned(), Arc::clone(f))) | ||
.collect() | ||
} | ||
JoinType::LeftMark => left_fields | ||
.map(|(q, f)| (q.cloned(), Arc::clone(f))) | ||
.chain(once(mark_field(right))) | ||
.collect(), | ||
JoinType::RightSemi | JoinType::RightAnti => { | ||
// Only use the right side for the schema | ||
right_fields | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there such a thing as right mark join? If so, can we add a issue/TODO for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is. I started adding both in a single PR but realised that is was more changes then I expected, so I did LeftMark first to keep the PR smaller. But created an issue for RightMark here: #13138