Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/clickhouse] Default async_insert to true. Added related config option. #32340

Closed

Conversation

SpencerTorres
Copy link
Member

@SpencerTorres SpencerTorres commented Apr 12, 2024

Description:
Sets async_insert to true by default to enable asynchronous inserts.
Because this value is being given a default, I have added a config option under the same name.
Keep in mind that if async_insert is provided in endpoint or connection_params it will take precedence and ignore this new config option.

This is similar to how the database config option behaves.
The goal is to provide better insert performance by default, since not all users will know to set it in their DSN URL.

This also opens the discussion to whether or not this is a breaking change. Depending on the deployment's telemetry throughput, this could be an unexpected change that leads to TOO_MANY_PARTS errors. I don't expect this to be the case however, but I welcome any discussion about this concern.

Link to tracking Issue:

Testing:
Ran integration tests. Also added an abundance of tests to check the behavior of async_insert when present in endpoint, connection_params, and exporter config.

Documentation:
Added new config option to README + code comments.

Unrelated change, also updated README's SQL samples to use sql instead of clickhouse for the code samples to enable proper syntax highlighting. ClickHouse SQL is compatible with plain SQL.

Copy link
Contributor

@Frapschen Frapschen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.chloggen/clickhouse-default-async-insert.yaml Outdated Show resolved Hide resolved
.chloggen/clickhouse-default-async-insert.yaml Outdated Show resolved Hide resolved
.chloggen/clickhouse-default-async-insert.yaml Outdated Show resolved Hide resolved
// AsyncInsert if true will enable async inserts. Default is `true`.
// Ignored if async inserts are configured in the `endpoint` or `connection_params`.
// Async inserts may still be overridden server-side.
AsyncInsert *bool `mapstructure:"async_insert"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a pointer instead of just a value? The only reason I could think of would be for it to be empty, as in your tests, but this is true by default, so it couldn't be empty, as far as I can tell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default only gets applied if it's not in endpoint/connection_params AND if cfg.AsyncInsert is nil.

If it were a regular bool, Go would initialize the struct with false, and so in the config parsing, I wouldn't be able to determine if it was set to false by the user or by the struct's initialization.

So the code would look like:

// This effectively ignores my config if I set cfg.AsyncInsert to false.
if cfg.AsyncInsert == false {
    queryParams.Set("async_insert", "true")
}

So if I wanted it to remain false in my config, it would never work, I would have to set it in endpoint or connection_params.

This is similar to the database config option, except in that case Go will initialize it to "", so the code checks for that instead of nil.

TL;DR: false is false, true is true, nil is true

Copy link
Member

@crobert-1 crobert-1 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I understand your concern. However, the way configurations are parsed by the collector works in this case. It first calls createDefaultConfig, then parses whatever configuration the user defined, overwriting any default values with given values (and leaving default values as-is if not defined by user). This is done here, for reference.

So even if you have a bool with default true, if you set it to true in createDefaultConfig, it will only be overwritten if the user specifies async_insert: false in their config.

So would it work to use bool for your use case, or is there more reason than that?

I realize this logic is a bit involved and complicated since this variable is coming from potentially three different places, so would it work to add some kind of getter method that will handle precedence of different sources of async_insert and then require all references be to the getter method instead of directly parsing config options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SpencerTorres, I was wondering if you got a chance to test this, or look into it more?

exporter/clickhouseexporter/README.md Show resolved Hide resolved
exporter/clickhouseexporter/config_test.go Show resolved Hide resolved
@SpencerTorres
Copy link
Member Author

@crobert-1 I appreciate the review! I have applied all of your suggestions:

  • Updated README note for connection_params overwriting endpoint
  • Added test for connection_params overwriting endpoint and async_insert
  • Updated changelog to your suggested phrase, as well as a sentence that says how to preserve the old behavior

Let me know if anything else stands out. Thanks!

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Apr 30, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 15, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this May 29, 2024
@crobert-1
Copy link
Member

Hey @SpencerTorres, feel free to reopen when you're able to work on this again, I just had a question/suggestion around the boolean pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants