-
Notifications
You must be signed in to change notification settings - Fork 3
POC: Merge extension #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
POC: Merge extension #1022
Conversation
// table accordingly. | ||
pub struct MergeIntoSink { | ||
pub input: Arc<LogicalPlan>, | ||
pub target: DataFusionTable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathanc-n The target could also be an Arc<dyn TableProvoder>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also pass the target as LogicalPlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently just looking at it, will look at it again when it is more complete
} | ||
|
||
fn fmt_for_explain(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { | ||
write!(f, "MergeIntoSink") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can specify the catalog, schema, and table here
|
||
fn dyn_hash(&self, state: &mut dyn Hasher) { | ||
"MergeIntoSink".dyn_hash(state); | ||
self.input.dyn_hash(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should hash target as well, if two sinks point at different tables can lead to errors
This PR provides a rough structure of how the writing phase of the MERGE INTO statement looks like. There are still some parts missing. But I think the general structure with a LogicalPlan "MergeIntoSink" and a PhysicalPlan "MergeIntoSinkExec" should work.
The idea is to adopt the
merge_query
function to prepare a logical plan that emits a RecordBatchStream of the desired form. As a last step of themerge_query
function this logical plan is then passed as an input to the "MergeIntoSink" LogicalPlan. The Extension plan will then during execution convert the "MergeIntoSink" into the "MergeIntoSinkExec" which will perform the desired operations.