Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/clickhouse] Default async_insert to true. Added related config option. #32340

Closed
33 changes: 33 additions & 0 deletions .chloggen/clickhouse-default-async-insert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: exporter/clickhouse

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: "Add `async_insert` config option to enable inserting asynchronously by default."

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [32340]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Adds `async_insert` config option to enable inserting asynchronously by default.
To preserve the previous behavior, set `async_insert` to `false` in your config.
When enabled, the exporter will insert asynchronously, which can improve performance for high-throughput deployments.
The `async_insert` option can be set to `true` or `false` to enable or disable async inserts, respectively. The default value is `true`.
Keep in mind this setting is added since the exporter now sets it to default.
Async insert and its related settings can still be defined in `endpoint` and `connection_params`, which take priority over the new config option.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
34 changes: 18 additions & 16 deletions exporter/clickhouseexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,13 @@ as [ClickHouse document says:](https://clickhouse.com/docs/en/introduction/perfo
dashboard.
Support time-series graph, table and logs.

2. Analyze logs via powerful clickhouse SQL.
2. Analyze logs via powerful ClickHouse SQL.

### Logs

- Get log severity count time series.

```clickhouse
```sql
SELECT toDateTime(toStartOfInterval(Timestamp, INTERVAL 60 second)) as time, SeverityText, count() as count
FROM otel_logs
WHERE time >= NOW() - INTERVAL 1 HOUR
Expand All @@ -52,7 +52,7 @@ ORDER BY time;

- Find any log.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE Timestamp >= NOW() - INTERVAL 1 HOUR
Expand All @@ -61,7 +61,7 @@ Limit 100;

- Find log with specific service.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE ServiceName = 'clickhouse-exporter'
Expand All @@ -71,7 +71,7 @@ Limit 100;

- Find log with specific attribute.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE LogAttributes['container_name'] = '/example_flog_1'
Expand All @@ -81,7 +81,7 @@ Limit 100;

- Find log with body contain string token.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE hasToken(Body, 'http')
Expand All @@ -91,7 +91,7 @@ Limit 100;

- Find log with body contain string.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE Body like '%http%'
Expand All @@ -101,7 +101,7 @@ Limit 100;

- Find log with body regexp match string.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE match(Body, 'http')
Expand All @@ -111,7 +111,7 @@ Limit 100;

- Find log with body json extract.

```clickhouse
```sql
SELECT Timestamp as log_time, Body
FROM otel_logs
WHERE JSONExtractFloat(Body, 'bytes') > 1000
Expand All @@ -123,7 +123,7 @@ Limit 100;

- Find spans with specific attribute.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand All @@ -147,7 +147,7 @@ Limit 100;

- Find traces with traceID (using time primary index and TraceID skip index).

```clickhouse
```sql
WITH
'391dae938234560b16bb63f51501cb6f' as trace_id,
(SELECT min(Start) FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as start,
Expand Down Expand Up @@ -175,7 +175,7 @@ Limit 100;

- Find spans is error.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand All @@ -199,7 +199,7 @@ Limit 100;

- Find slow spans.

```clickhouse
```sql
SELECT Timestamp as log_time,
TraceId,
SpanId,
Expand Down Expand Up @@ -240,13 +240,13 @@ Prometheus(or someone else uses OpenMetrics protocol), you also need to know the
between Prometheus(OpenMetrics) and OTLP Metrics.

- Find a sum metrics with name
```clickhouse
```sql
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' limit 100
```

- Find a sum metrics with name, attribute.
```clickhouse
```sql
select TimeUnix,MetricName,Attributes,Value from otel_metrics_sum
where MetricName='calls_total' and Attributes['service_name']='featureflagservice'
limit 100
Expand Down Expand Up @@ -282,7 +282,8 @@ Connection options:
- `ttl_days` (default = 0): **Deprecated: Use 'ttl' instead.** The data time-to-live in days, 0 means no ttl.
- `ttl` (default = 0): The data time-to-live example 30m, 48h. Also, 0 means no ttl.
- `database` (default = otel): The database name.
- `connection_params` (default = {}). Params is the extra connection parameters with map format.
- `connection_params` (default = {}). Params is the extra connection parameters with map format. Query parameters provided in `endpoint` will be individually overwritten if present in this map.
- `async_insert` (default = true): Enables [async inserts](https://clickhouse.com/docs/en/optimize/asynchronous-inserts). Ignored if async inserts are configured in the `endpoint` or `connection_params`. Async inserts may still be overridden server-side.
SpencerTorres marked this conversation as resolved.
Show resolved Hide resolved

ClickHouse tables:

Expand Down Expand Up @@ -338,6 +339,7 @@ exporters:
clickhouse:
endpoint: tcp://127.0.0.1:9000?dial_timeout=10s&compress=lz4
database: otel
async_insert: true
ttl: 72h
logs_table_name: otel_logs
traces_table_name: otel_traces
Expand Down
18 changes: 18 additions & 0 deletions exporter/clickhouseexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ type Config struct {
TableEngine TableEngine `mapstructure:"table_engine"`
// ClusterName if set will append `ON CLUSTER` with the provided name when creating tables.
ClusterName string `mapstructure:"cluster_name"`
// AsyncInsert if true will enable async inserts. Default is `true`.
// Ignored if async inserts are configured in the `endpoint` or `connection_params`.
// Async inserts may still be overridden server-side.
AsyncInsert *bool `mapstructure:"async_insert"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a pointer instead of just a value? The only reason I could think of would be for it to be empty, as in your tests, but this is true by default, so it couldn't be empty, as far as I can tell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default only gets applied if it's not in endpoint/connection_params AND if cfg.AsyncInsert is nil.

If it were a regular bool, Go would initialize the struct with false, and so in the config parsing, I wouldn't be able to determine if it was set to false by the user or by the struct's initialization.

So the code would look like:

// This effectively ignores my config if I set cfg.AsyncInsert to false.
if cfg.AsyncInsert == false {
    queryParams.Set("async_insert", "true")
}

So if I wanted it to remain false in my config, it would never work, I would have to set it in endpoint or connection_params.

This is similar to the database config option, except in that case Go will initialize it to "", so the code checks for that instead of nil.

TL;DR: false is false, true is true, nil is true

Copy link
Member

@crobert-1 crobert-1 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I understand your concern. However, the way configurations are parsed by the collector works in this case. It first calls createDefaultConfig, then parses whatever configuration the user defined, overwriting any default values with given values (and leaving default values as-is if not defined by user). This is done here, for reference.

So even if you have a bool with default true, if you set it to true in createDefaultConfig, it will only be overwritten if the user specifies async_insert: false in their config.

So would it work to use bool for your use case, or is there more reason than that?

I realize this logic is a bit involved and complicated since this variable is coming from potentially three different places, so would it work to add some kind of getter method that will handle precedence of different sources of async_insert and then require all references be to the getter method instead of directly parsing config options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SpencerTorres, I was wondering if you got a chance to test this, or look into it more?

}

// TableEngine defines the ENGINE string value when creating the table.
Expand All @@ -64,6 +68,11 @@ var (
errConfigTTL = errors.New("both 'ttl_days' and 'ttl' can not be provided. 'ttl_days' is deprecated, use 'ttl' instead")
)

// configBool returns a pointer to a new bool.
func configBool(b bool) *bool {
return &b
}

// Validate the ClickHouse server configuration.
func (cfg *Config) Validate() (err error) {
if cfg.Endpoint == "" {
Expand Down Expand Up @@ -105,6 +114,15 @@ func (cfg *Config) buildDSN(database string) (string, error) {
queryParams.Set("secure", "true")
}

// Default async_insert to true if not specified in DSN. Use config value if specified.
if !queryParams.Has("async_insert") {
if cfg.AsyncInsert != nil {
queryParams.Set("async_insert", fmt.Sprintf("%t", *cfg.AsyncInsert))
} else {
queryParams.Set("async_insert", "true")
}
}

// Override database if specified in config.
if cfg.Database != "" {
dsnURL.Path = cfg.Database
Expand Down
99 changes: 90 additions & 9 deletions exporter/clickhouseexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ func TestConfig_buildDSN(t *testing.T) {
Password string
Database string
ConnectionParams map[string]string
AsyncInsert *bool
}
type args struct {
database string
Expand All @@ -133,7 +134,7 @@ func TestConfig_buildDSN(t *testing.T) {
wantChOptions: ChOptions{
Secure: false,
},
want: "clickhouse://127.0.0.1:9000/default",
want: "clickhouse://127.0.0.1:9000/default?async_insert=true",
},
{
name: "Support tcp scheme",
Expand All @@ -144,7 +145,7 @@ func TestConfig_buildDSN(t *testing.T) {
wantChOptions: ChOptions{
Secure: false,
},
want: "tcp://127.0.0.1:9000/default",
want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "prefers database name from config over from DSN",
Expand All @@ -160,7 +161,7 @@ func TestConfig_buildDSN(t *testing.T) {
wantChOptions: ChOptions{
Secure: false,
},
want: "clickhouse://foo:bar@127.0.0.1:9000/otel",
want: "clickhouse://foo:bar@127.0.0.1:9000/otel?async_insert=true",
},
{
name: "use database name from DSN if not set in config",
Expand All @@ -176,7 +177,7 @@ func TestConfig_buildDSN(t *testing.T) {
wantChOptions: ChOptions{
Secure: false,
},
want: "clickhouse://foo:bar@127.0.0.1:9000/otel",
want: "clickhouse://foo:bar@127.0.0.1:9000/otel?async_insert=true",
},
{
name: "invalid config",
Expand All @@ -197,7 +198,7 @@ func TestConfig_buildDSN(t *testing.T) {
Secure: true,
},
args: args{},
want: "https://127.0.0.1:9000/default?secure=true",
want: "https://127.0.0.1:9000/default?async_insert=true&secure=true",
},
{
name: "Preserve query parameters",
Expand All @@ -208,7 +209,7 @@ func TestConfig_buildDSN(t *testing.T) {
Secure: true,
},
args: args{},
want: "clickhouse://127.0.0.1:9000/default?foo=bar&secure=true",
want: "clickhouse://127.0.0.1:9000/default?async_insert=true&foo=bar&secure=true",
},
{
name: "Parse clickhouse settings",
Expand All @@ -221,7 +222,7 @@ func TestConfig_buildDSN(t *testing.T) {
Compress: clickhouse.CompressionLZ4,
},
args: args{},
want: "https://127.0.0.1:9000/default?compress=lz4&dial_timeout=30s&secure=true",
want: "https://127.0.0.1:9000/default?async_insert=true&compress=lz4&dial_timeout=30s&secure=true",
},
{
name: "Should respect connection parameters",
Expand All @@ -233,7 +234,7 @@ func TestConfig_buildDSN(t *testing.T) {
Secure: true,
},
args: args{},
want: "clickhouse://127.0.0.1:9000/default?foo=bar&secure=true",
want: "clickhouse://127.0.0.1:9000/default?async_insert=true&foo=bar&secure=true",
},
{
name: "support replace database in DSN to default database",
Expand All @@ -243,7 +244,86 @@ func TestConfig_buildDSN(t *testing.T) {
args: args{
database: defaultDatabase,
},
want: "tcp://127.0.0.1:9000/default",
want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "when config option is missing, preserve async_insert false in DSN",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=false",
},
want: "tcp://127.0.0.1:9000/default?async_insert=false",
},
{
name: "when config option is missing, preserve async_insert true in DSN",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=true",
},
want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "ignore config option when async_insert is present in connection params as false",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=false",
ConnectionParams: map[string]string{"async_insert": "false"},
AsyncInsert: configBool(true),
},

want: "tcp://127.0.0.1:9000/default?async_insert=false",
},
{
name: "ignore config option when async_insert is present in connection params as true",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=false",
ConnectionParams: map[string]string{"async_insert": "true"},
AsyncInsert: configBool(false),
},

want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "ignore config option when async_insert is present in DSN as false",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=false",
AsyncInsert: configBool(true),
},

want: "tcp://127.0.0.1:9000/default?async_insert=false",
},
{
name: "use async_insert true config option when it is not present in DSN",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000",
AsyncInsert: configBool(true),
},

want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "use async_insert false config option when it is not present in DSN",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000",
AsyncInsert: configBool(false),
},

want: "tcp://127.0.0.1:9000/default?async_insert=false",
},
{
name: "set async_insert to true when not present in config or DSN",
SpencerTorres marked this conversation as resolved.
Show resolved Hide resolved
fields: fields{
Endpoint: "tcp://127.0.0.1:9000",
},

want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
{
name: "connection_params takes priority over endpoint and async_insert option.",
fields: fields{
Endpoint: "tcp://127.0.0.1:9000?async_insert=false",
ConnectionParams: map[string]string{"async_insert": "true"},
AsyncInsert: configBool(false),
},

want: "tcp://127.0.0.1:9000/default?async_insert=true",
},
}
for _, tt := range tests {
Expand All @@ -254,6 +334,7 @@ func TestConfig_buildDSN(t *testing.T) {
Password: configopaque.String(tt.fields.Password),
Database: tt.fields.Database,
ConnectionParams: tt.fields.ConnectionParams,
AsyncInsert: tt.fields.AsyncInsert,
}
got, err := cfg.buildDSN(tt.args.database)

Expand Down
Loading