Skip to content

POC: Merge extension #1022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

POC: Merge extension #1022

wants to merge 16 commits into from

Conversation

JanKaul
Copy link
Contributor

@JanKaul JanKaul commented Jun 5, 2025

This PR provides a rough structure of how the writing phase of the MERGE INTO statement looks like. There are still some parts missing. But I think the general structure with a LogicalPlan "MergeIntoSink" and a PhysicalPlan "MergeIntoSinkExec" should work.

The idea is to adopt the merge_query function to prepare a logical plan that emits a RecordBatchStream of the desired form. As a last step of the merge_query function this logical plan is then passed as an input to the "MergeIntoSink" LogicalPlan. The Extension plan will then during execution convert the "MergeIntoSink" into the "MergeIntoSinkExec" which will perform the desired operations.

// table accordingly.
pub struct MergeIntoSink {
pub input: Arc<LogicalPlan>,
pub target: DataFusionTable,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathanc-n The target could also be an Arc<dyn TableProvoder>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also pass the target as LogicalPlan

Copy link

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently just looking at it, will look at it again when it is more complete

}

fn fmt_for_explain(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "MergeIntoSink")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can specify the catalog, schema, and table here


fn dyn_hash(&self, state: &mut dyn Hasher) {
"MergeIntoSink".dyn_hash(state);
self.input.dyn_hash(state);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should hash target as well, if two sinks point at different tables can lead to errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants