Add config option to ignore insert ids #296

iht · 2020-09-04T19:58:01Z

When inserting in streaming in BigQuery, if you set insert ids (default option with the connector), BigQuery will deduplicate the insertions and the quotas (number of rows per second, size in bytes per second) will be much lower than without deduplication.

Currently, there is no option in the connector to disable this deduplication. This pull request adds a configuration option to ignore insert ids, and insert all rows with null id. This will disable the deduplication in BigQuery (risking duplicates insertions) and the applicable quotas will be much higher (millions of rows per second, GBs per second).

The documentation of BigQuery contains a mention to this option in the Apache Beam connector. I am working with customers who are missing a similar configuration option in this connector.

With this pull request, you can set the option bigQueryIgnoreInsertId to true to insert without deduplication and with higher qutoas.

More info:

CLAassistant · 2020-09-04T19:58:18Z

All committers have signed the CLA.

codecov-commenter · 2020-09-04T20:01:38Z

Codecov Report

Merging #296 into master will decrease coverage by 0.24%.
The diff coverage is 64.70%.

@@             Coverage Diff              @@
##             master     #296      +/-   ##
============================================
- Coverage     66.10%   65.86%   -0.25%     
  Complexity      267      267              
============================================
  Files            32       32              
  Lines          1484     1497      +13     
  Branches        152      154       +2     
============================================
+ Hits            981      986       +5     
- Misses          450      456       +6     
- Partials         53       55       +2

Impacted Files	Coverage Δ	Complexity Δ
...wepay/kafka/connect/bigquery/BigQuerySinkTask.java	`56.63% <33.33%> (-0.59%)`	`27.00 <0.00> (ø)`
...ka/connect/bigquery/utils/SinkRecordConverter.java	`61.90% <50.00%> (-4.77%)`	`3.00 <0.00> (ø)`
...nect/bigquery/write/batch/GCSBatchTableWriter.java	`80.64% <66.66%> (-5.57%)`	`3.00 <0.00> (ø)`
...afka/connect/bigquery/write/batch/TableWriter.java	`67.79% <66.66%> (-2.38%)`	`6.00 <0.00> (ø)`
...onnect/bigquery/config/BigQuerySinkTaskConfig.java	`95.65% <100.00%> (+0.26%)`	`14.00 <0.00> (ø)`

C0urante · 2020-09-08T14:08:17Z

@iht I think this is addressed in #277, which has been reviewed but not merged yet.

iht · 2020-09-10T09:48:59Z

I should review the list of pending pull requests before attempting to contribute new changes...

Thanks for the heads up, I will keep an eye on #277 and will close this pull request.

Add config option to ignore insert ids

70b9453

iht closed this Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add config option to ignore insert ids #296

Add config option to ignore insert ids #296

Uh oh!

iht commented Sep 4, 2020

Uh oh!

CLAassistant commented Sep 4, 2020 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 4, 2020 •

edited

Loading

Uh oh!

C0urante commented Sep 8, 2020

Uh oh!

iht commented Sep 10, 2020

Uh oh!

Uh oh!

Add config option to ignore insert ids #296

Add config option to ignore insert ids #296

Uh oh!

Conversation

iht commented Sep 4, 2020

Uh oh!

CLAassistant commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

C0urante commented Sep 8, 2020

Uh oh!

iht commented Sep 10, 2020

Uh oh!

Uh oh!

CLAassistant commented Sep 4, 2020 •

edited

Loading

codecov-commenter commented Sep 4, 2020 •

edited

Loading