Skip to content

Make JDBC write parallelism configurable #16280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 5, 2023

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Feb 27, 2023

Added an option query.max-writer-nodes-count to QueryManagerConfig and
a session option that limits number of tasks that take part in writing
nodes.

Description

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Feb 27, 2023
@wendigo
Copy link
Contributor Author

wendigo commented Feb 27, 2023

On top of the #16238

@wendigo wendigo force-pushed the serafin/jdbc-write-parallelism branch from 112df4a to bb789cd Compare February 27, 2023 13:22
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

@wendigo wendigo force-pushed the serafin/jdbc-write-parallelism branch from bb789cd to 1791466 Compare March 18, 2023 06:02
@wendigo wendigo changed the title [WiP] Make JDBC write parallelism configurable Make JDBC write parallelism configurable Mar 18, 2023
@wendigo
Copy link
Contributor Author

wendigo commented Mar 18, 2023

Rebased and addressed review.

@hashhar
Copy link
Member

hashhar commented Mar 18, 2023

Note that maybe default should be Optional.empty where it's bounded only by cluster size like before this change otherwise some scenarios may see regression due to smaller number of writers being involved.

@wendigo
Copy link
Contributor Author

wendigo commented Mar 20, 2023

@hashhar I'd prefer a sane default, rather than previous behavior.

@wendigo wendigo force-pushed the serafin/jdbc-write-parallelism branch 3 times, most recently from 22bdf8d to 587d00c Compare March 21, 2023 20:24
@wendigo wendigo force-pushed the serafin/jdbc-write-parallelism branch from 587d00c to 9669ffe Compare June 26, 2023 12:21
@wendigo wendigo requested review from kokosing and hashhar June 26, 2023 12:21
Copy link
Contributor

@ssheikin ssheikin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

@@ -22,8 +22,10 @@
public class JdbcWriteConfig
{
public static final int MAX_ALLOWED_WRITE_BATCH_SIZE = 10_000_000;
static final int DEFAULT_WRITE_PARALELLISM = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default QueryManagerConfig defines maxWriterTasksCount = 100 so this is a drastic reduction of writers. Is it by intention?
I see, that it's only for jdbc, so it's probably acceptable and 8 is even better than 100.
Should it be documented?
@jhlodin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 means unbounded. for jdbc the higher number of writers, the bigger number of connections opened/acquired/held per write which can result in high lock contention on the database side.

@wendigo wendigo requested review from kokosing and ssheikin July 3, 2023 09:35
@wendigo wendigo merged commit 8488e30 into trinodb:master Jul 5, 2023
@wendigo wendigo deleted the serafin/jdbc-write-parallelism branch July 5, 2023 14:21
@github-actions github-actions bot added this to the 421 milestone Jul 5, 2023
@colebow
Copy link
Member

colebow commented Jul 5, 2023

Does this need a release note? @wendigo

@mosabua
Copy link
Member

mosabua commented Jul 5, 2023

I think yes... and also docs..

getQueryRunner()::execute,
"write_parallelism",
"(a varchar(128), b bigint)")) {
assertUpdate(session, "INSERT INTO " + table.getName() + " (a, b) SELECT clerk, orderkey FROM tpch.sf100.orders LIMIT " + numberOfRows, numberOfRows, plan -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very generous

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants