Skip to content

Conversation

@theory
Copy link
Contributor

@theory theory commented Oct 30, 2025

Add a new pattern for "prepared inserts". It works like this:

  • Call PrepareInsert with an INSERT query with optional columns and ending in VALUES. No values should be included in the string.
  • It returns a PreparedInsert object that has two methods:
    • Block() returns a Block pre-configured with columns as declared in the INSERT statement
    • Execute() inserts data from the block then clears it.
  • When the PreparedInsert object goes out of scope it first signals the server that it's done sending data.

This allows one to send smaller batches of blocks, thereby using less memory, but still in a single ClickHouse INSERT operation.

Expected to be useful in the Postgres foreign data wrapper insert API, where multiple rows can be inserted at once but its API handles one-at-a-time insertion. It will also support the FDW COPY API, which can submit huge batches of data to insert, as well.

Comment on lines +1191 to +1178
if (chtype->GetCode() == Type::LowCardinality) {
chtype = col->As<ColumnLowCardinality>()->GetNestedType();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure this is the right thing to do. Might one need Type::LowCardonality?


void FinishInsert();

void SendData(const Block& block);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this to public so that PreparedInsert can call it. Not in the header file, though, so shouldn't matter.

public:
Block * GetBlock();
void Execute();
// XXX This shouldn't be public.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out how to make this private. Suggestions appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if it worked declared public in the .cpp file, but I think I could also use an Impl class like Client does to hide such things.

@theory theory force-pushed the insert-block branch 4 times, most recently from ade33f4 to 51d8216 Compare October 31, 2025 19:03
Add a new pattern for "prepared inserts". It works like this:

*   Call `PrepareInsert` with an `INSERT` query with optional columns
    and ending in `VALUES`. No values should be included in the string.
*   It returns a `PreparedInsert` object that has two methods:
    *   `Block()` returns a `Block` pre-configured with columns as
        declared in the `INSERT` statement
    *   `Execute()` inserts data from the block then clears it.
*   Call `Finish()` or just let the `PreparedInsert` object go out of
    scope to send any remaining rows and to signal the server that it's
    done.

This allows one to send smaller batches of blocks, thereby using less
memory, but still in a single ClickHouse `INSERT` operation.

Expected to be useful in the Postgres foreign data wrapper insert API,
where multiple rows can be inserted at once but its API handles
one-at-a-time insertion. It will also support the FDW COPY API, which
can submit huge batches of data to insert, as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant