Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spans to clickhouse exporter #10300

Closed
bputt-e opened this issue May 25, 2022 · 16 comments
Closed

Add spans to clickhouse exporter #10300

bputt-e opened this issue May 25, 2022 · 16 comments
Assignees
Labels

Comments

@bputt-e
Copy link

bputt-e commented May 25, 2022

Is your feature request related to a problem? Please describe.
We're interested in storing spans to clickhouse vs using something like ES or Loki/Tempo as it reduces architectural complexity and gives us more freedom with implementing wrappers around the database for visualizations within tools like grafana

Describe the solution you'd like
Looking to add spans as a supported option for clickhouse exporter

Describe alternatives you've considered
We're utilizing loki + tempo, ideally we could replace those with just clickhouse

@bputt-e
Copy link
Author

bputt-e commented May 25, 2022

@hanjm do you have plans to implement this in the future?

@hanjm
Copy link
Member

hanjm commented May 25, 2022

@bputt-e plan but i not have much time recently, would you like to contrib it ? 😄

grafana-clickhouse-datasource support query clickhouse logs in explore UI. it will also be useful to search spans. grafana/clickhouse-datasource#23

@bputt-e
Copy link
Author

bputt-e commented May 26, 2022

@hanjm possibly, though we are considering the following which may make more sense for our own code separate from this (we're interested in multitenancy)

  1. Add a column for the clientId, we have the clientId in the tags, but want to move that one level up, also preventing info from being exposed to end user
  2. Add a column for another field which will exist in the tags, just abstracting it up one level, also preventing info from being exposed to end user
  3. Adding a custom TTL per clientId would be nice, since clients have different retention needs and taking that into consideration with batching as well...we'd essentially want to batch per clientId in order to make TTL easier
  4. As for 1 & 2, we may use them as a part of the order by table schema

I suspect 3 is doable using a kafka topic per client, tradeoffs I suppose :)

@bputt-e
Copy link
Author

bputt-e commented Jun 23, 2022

@hanjm what are your thoughts on the following table schemas, ignore the fact there's no indexes or materialized columns for things like http_status.

We're going to test option 4 as converting everything to strings will have a cost on storage, but for operations (things like >, <, >=), I'm expecting people can use functions on the column in the where clause.

Option 1

CREATE TABLE spans
(
    `traceId` String,
    `spanId` String,
    `parentSpanId` String,
    `name` String,
    `kind` String,
    `createdTime` DateTime default now(),
    `startTime` DateTime64(9),
    `endTime` DateTime64(9),
    `ResourceAttributes` Nested (
        Key LowCardinality(String),
        Value String
    ),
    `attributes` Nested (
        Key LowCardinality(String),
        Value Nested(
            stringValue Nullable(String),
            boolValue Nullable(Bool),
            intValue Nullable(UInt32),
            doubleValue Nullable(Float64),
            mapValue Map(String, String),
            arrayValue Array(String),
            byteValue Array(String)
        )
    )
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(startTime)
ORDER BY (traceId);

Option 2

CREATE TABLE spans
(
    `traceId` String,
    `spanId` String,
    `parentSpanId` String,
    `name` String,
    `kind` String,
    `createdTime` DateTime default now(),
    `startTime` DateTime64(9),
    `endTime` DateTime64(9),
    `resourceAttr` Map(LowCardinality(String), String),
    `stringAttr` Map(LowCardinality(String), String),
    `intAttr` Map(LowCardinality(String), UInt32),
    `doubleAttr` Map(LowCardinality(String), Float64),
    `boolAttr` Map(LowCardinality(String), Bool),
    `mapAttr` Map(LowCardinality(String), Map(String, String)),
    `arrayAttr` Map(LowCardinality(String), Array(String)),
    `byteAttr` Map(LowCardinality(String), Array(String))
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(startTime)
ORDER BY (traceId);

Option 3

CREATE TABLE spans
(
    `traceId` String,
    `spanId` String,
    `parentSpanId` String,
    `name` String,
    `kind` String,
    `createdTime` DateTime default now(),
    `startTime` DateTime64(9),
    `endTime` DateTime64(9),
    `ResourceAttributes` Nested (
        Key LowCardinality(String),
        Value String
    ),
    `StringAttributes` Nested (
        Key LowCardinality(String),
        Value String
    ),
    `IntAttributes` Nested (
        Key LowCardinality(String),
        Value UInt32
    ),
    `DoubleAttributes` Nested (
        Key LowCardinality(String),
        Value Float64
    ),
    `BoolAttributes` Nested (
        Key LowCardinality(String),
        Value Bool
    ),
    `MapAttributes` Nested (
        Key LowCardinality(String),
        Value Map(String, String)
    ),
    `ArrayAttributes` Nested (
        Key LowCardinality(String),
        Value Array(String)
    ),
    `ByteAttributes` Nested (
        Key LowCardinality(String),
        Value Array(String)
    )
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(startTime)
ORDER BY (traceId);

Option 4

CREATE TABLE spans
(
    `traceId` String,
    `spanId` String,
    `parentSpanId` String,
    `name` String,
    `kind` String,
    `createdTime` DateTime default now(),
    `startTime` DateTime64(9),
    `endTime` DateTime64(9),
    `resourceAttributes` Nested (
        Key LowCardinality(String),
        Value Array(String)
    ),
    `spanAttributes` Nested (
        Key LowCardinality(String),
        Value Array(String)
    )
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(startTime)
ORDER BY (traceId);

@hanjm
Copy link
Member

hanjm commented Jun 24, 2022

@bputt-e
In my practise, I think Option 2 is prefered. the map keys and values could use bloom filter secondary index, and the query SQL is stringAttr['key']='value', simply than the nested type stringAttr.Value[indexOf(stringAttr.Key, 'key')]='value'.
the clickhouse map type is introduced in 2021 later, the clickhouse system logs tables has change nested type to map type . ClickHouse/ClickHouse#18698

we can only use string and number(float) for map values, string type for search tags, number type for compare operation (>= <=), the complex type (map/array) could store as json string https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/internal/common.go#L466, clickhouse also provide useful json functions.

    `stringAttr` Map(LowCardinality(String), String),
    `doubleAttr` Map(LowCardinality(String), Float64),

@alvinhom
Copy link

It maybe helpful to explore storing attributes using the JSON object type with dynamic subcolumns:

https://clickhouse.com/docs/en/guides/developer/working-with-json/json-semi-structured

That way the data should be stored in the optimal format with the inferred type and queries can just use the dot notation like select event.tag.db.host from spans instead of using the string array notation.

@hanjm
Copy link
Member

hanjm commented Jul 30, 2022

It maybe helpful to explore storing attributes using the JSON object type with dynamic subcolumns:

https://clickhouse.com/docs/en/guides/developer/working-with-json/json-semi-structured

That way the data should be stored in the optimal format with the inferred type and queries can just use the dot notation like select event.tag.db.host from spans instead of using the string array notation.

It is a excited experiment feature, currently it's advantage is extracting JSON fields into dedicated columns, store as seperate native columns, useful for create primary/order key for specific column. but it cannot create secondary index like map, the map type could use bloom filter secondary index.
IMO, the object('json') suitable for the common semantic convention attribute, if object('json') could add full keys and values bloom filter index like map type in the feature, it will also suitable for custom span attribute.

@bputt-e
Copy link
Author

bputt-e commented Aug 1, 2022

we're experimenting with map atm to determine its long-term usefulness. We're staying away from json atm because it isn't supported for distributed tables which is what something like this would benefit from

A potential problem with json is that if you're dealing with multiple customers and one decides to name their tag xxx and accidentally puts in a numeric value and it gets generated as a new column, how would you handle cases where they fix the value to be string or if another customer comes in and their tag xxx is of another type? We're using a materialized column to set a "tenantId" based on the auth information passed to otel, so this affords us the ability to use the same table in the db. Maps afford us the ability to separate string/numeric fields upfront too

@alvinhom
Copy link

alvinhom commented Aug 2, 2022

According to the Clickhouse documentation, the type inference will do type coercion:

"This sort of coercion is supported for most types that have variable representation e.g. Int, Float. If necessary, ClickHouse will unify to the higher bit type that allows all current values to be represented. If necessary, converting to a String represents the least precise definition."

I am not sure how putting this into a map is more flexible. If the data type is inconsistent, you would have to coerce to the higher type and do type casting at the query level. You would need the same thing for the map.

The main advantage of the JSON approach is the simplified query, and better storage and performance. For performance, since each key:value attribute is actually stored as a database column, it gets the performance improvement of the columnar database that Clickhouse is. If you are retrieving only a specific key(column), it only has to scan that column's data. Whereas map, they have to scan the entire map. It should have better performance compare to bloom filters.

According to the recent video by Altinity, they posted some performance number differences: about 5x to 6x better performance using the JSON type data at query time versus non native type.

https://www.youtube.com/watch?v=FWUKxLS5slc

As for not supporting distributed table, that is a problem and hopefully will be fixed soon. Not sure the ETA though.

@alvinhom
Copy link

alvinhom commented Aug 2, 2022

BTW, for those that understand Mandarin Chinese, there is a video by BiliBili on how they are using Clickhouse for large scale for logs (trace is very similar).

https://www.youtube.com/watch?v=nSfzu8Q0B4I

They are using implicit maps, which basically flattens very key/value pair into separate columns. Which is basically what the JSON data type is doing. They have some performance number attached on their slide showing the performance improvements.

@bputt-e
Copy link
Author

bputt-e commented Aug 2, 2022

do you have any concerns on null values since it's likely each column won't always have a value for each row

@alvinhom
Copy link

alvinhom commented Aug 3, 2022

It should be easy to just filter out null values in the query. I wrote a Python programs to load some of my trace spans from Elasticsearch into Clickhouse using the JSON schema, and the query with attributes are quite natural. The query looks something like this:

SELECT
event.tags.code.namespace,
count(*) AS total_count,
avg(duration) AS avg_duration,
quantile(0.9)(duration) AS p90_duration
FROM otel_spans
WHERE startTime >= '1659137086' AND startTime <= '1659223486'
AND event.process.serviceName = 'testdv1b-platform'
AND event.tags.code.namespace != ''
GROUP BY event.tags.code.namespace;

@bputt-e
Copy link
Author

bputt-e commented Aug 3, 2022

The null concern was around performance because the number of columns would increase based on the distinct number of attributes....I suppose it might be a non-issue, though, clickhouse prefers an empty string over null

@alvinhom
Copy link

alvinhom commented Aug 3, 2022

I don't think null or empty value will cause much storage bloat since it is columnar db and column can use run length encoding or other compression to compress empty/null values.

Query performance with separate columns (even with lots of null values) will be much better than a map.

We are using auto instrumentation (Java), so I don't expect a lot of different attribute keys. I guess this would be more of a concern if applications are doing custom instrumentation and introduce arbitrary attributes.

@bputt-e
Copy link
Author

bputt-e commented Aug 8, 2022

Wondering if this should omit certain attributes (KvlistValue, ArrayValue, BytesValue) similar to elastic's exporter

func ifaceAttributeValue(v pcommon.Value) interface{} {

Attribute listing
https://github.com/open-telemetry/opentelemetry-collector/blob/8db116788e40c6d418fc7838da879a6bb7236f2a/pdata/internal/common.go#L47

@dmitryax
Copy link
Member

Resolved by #13442

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants