Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Columnar encoding for the OpenTelemetry protocol #171

Merged
merged 142 commits into from
Jun 29, 2023
Merged
Changes from 1 commit
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
95da4ae
Intro + Motivation sections
lquerel Mar 14, 2021
aec7656
Move 0000-multivariate-timeseries.md in metrics folder
lquerel Mar 14, 2021
a3fbdb7
Add conclusion to the motivation section
lquerel Mar 14, 2021
6b550f4
Update 0000-multivariate-timeseries.md
lquerel Mar 14, 2021
3496ca1
Updated the "Explanation" section
lquerel Mar 21, 2021
56b3357
Added a diagram to present the data model
lquerel Mar 21, 2021
d760154
Added diagram
lquerel Mar 21, 2021
589dc8b
Update 0000-multivariate-timeseries.md
lquerel Mar 21, 2021
a516962
Examples of multivariate time-series
lquerel Mar 21, 2021
ecf45d2
Update 0000-multivariate-timeseries.md
lquerel Mar 21, 2021
58e9a53
Merge branch 'open-telemetry:main' into main
lquerel May 18, 2021
c809b1b
Update OTEP
lquerel May 18, 2021
150d987
Update OTEP
lquerel May 18, 2021
43340ea
Update OTEP
lquerel May 18, 2021
1448c91
Update OTEP
lquerel May 18, 2021
0db08ec
Update OTEP
lquerel May 18, 2021
f3739cf
Update OTEP
lquerel May 18, 2021
2218179
Merge branch 'open-telemetry:main' into main
lquerel Aug 10, 2021
ec813fe
Create OTEP 0156
lquerel Aug 10, 2021
8c34a91
Add columnar encoding benefits
lquerel Aug 10, 2021
e3c38db
Complete explanation section
lquerel Aug 10, 2021
68b4ed6
Create event.proto section
lquerel Aug 10, 2021
e25b3c6
Create internal details
lquerel Aug 10, 2021
07714b1
Update 0156-columnar-encoding.md
lquerel Aug 10, 2021
e82db1a
Add images for OTEP-0156
lquerel Aug 10, 2021
d9a621a
Add corner cases
lquerel Aug 10, 2021
8c7333e
Update first draft for review
lquerel Aug 10, 2021
91f9561
Remove initial proposal focusing only on the multivariate time-series…
lquerel Aug 10, 2021
c58a4ad
Merge remote-tracking branch 'origin/main'
lquerel Aug 10, 2021
52a85f7
Change Open Telemetry to OpenTelemetry
lquerel Aug 11, 2021
0b3f02d
Rephrases few sentences in the motivation section
lquerel Aug 11, 2021
e8c6ef1
Removed trailing-spaces to comply with markdown linter
lquerel Aug 11, 2021
dc74dd3
Fixed typo
lquerel Aug 11, 2021
fffce33
Fixed more markdown issues
lquerel Aug 11, 2021
72768c4
Add mapping OTEL metrics, logs, traces to Apache Arrow Schema
lquerel Aug 11, 2021
8695591
More explanation on the Arrow mapping and the memory layout.
lquerel Aug 11, 2021
9139969
Fixed markdown issues.
lquerel Aug 11, 2021
6499c63
Micro update to trigger the CLA checker again.
lquerel Aug 12, 2021
aa9c513
Update motivation section based on feedback from @jmacd
lquerel Aug 12, 2021
f0c6b90
Add some additional clarifications to @tigrannajaryan's feedback
lquerel Aug 13, 2021
fce05af
Span id and trace id are now nullable fields (as suggested by @jsuere…
lquerel Sep 18, 2021
33ea57d
Replaced label by attribute (as suggested by @jmacd and @jsuereth)
lquerel Sep 18, 2021
3d636d5
Fix markdown issues
lquerel Sep 21, 2021
19653e1
Add field declaration syntax based on @jmacd comment (https://github.…
lquerel Sep 21, 2021
81eb951
Merge branch 'main' into main
carlosalberto Nov 8, 2021
0590cb8
Update OTEP 0156 - Full protocol/mapping spec + Benchmark and more.
lquerel Jun 10, 2022
8449b67
Merge branch 'open-telemetry:main' into main
lquerel Jun 10, 2022
0be8ae3
Merge branch 'main' of https://github.com/lquerel/oteps
lquerel Jun 10, 2022
ab93d23
Fix markdown lint issue
lquerel Jun 11, 2022
f48c952
Fix markdown lint issue
lquerel Jun 11, 2022
0e6b9ff
Fix markdown lint issue
lquerel Jun 11, 2022
577b4bd
Merge branch 'open-telemetry:main' into main
lquerel Dec 27, 2022
5b62001
Rename EventStream into ArrowStream
lquerel Dec 27, 2022
ec0a623
Merge remote-tracking branch 'origin/main'
lquerel Dec 27, 2022
c0d134d
Update protobuf specification and corresponding documentation.
lquerel Dec 27, 2022
4f311f8
Fix markdown lint issues
lquerel Dec 27, 2022
4baab68
Update attribute representation section.
lquerel Dec 27, 2022
6b3d916
Update Metrics Payload section.
lquerel Dec 28, 2022
3cc3458
Update Logs and Traces Payload sections.
lquerel Dec 28, 2022
f24cb9c
Update Implementation Recommendations, Trade-offs and Mitigations, Pr…
lquerel Dec 28, 2022
efa0aad
Add links to the reference implementation.
lquerel Dec 28, 2022
b8cbf55
Update links to the reference implementation.
lquerel Dec 28, 2022
b3b00c0
Remove appendix D.
lquerel Dec 28, 2022
b4a95df
Add comment on data sharing.
lquerel Dec 28, 2022
8271a65
Fix markdown issues.
lquerel Dec 28, 2022
50557bc
Fix markdown issues.
lquerel Dec 28, 2022
e0dae99
Update Arrow Schemas based on Matt Topol feedback (Go Arrow committer).
lquerel Jan 6, 2023
321bbc5
Arrow Sparse vs Dense union.
lquerel Jan 7, 2023
50d4cd3
Use dictionary to represent small protobuf enum.
lquerel Jan 7, 2023
5f1e2d8
Fix navigation issue
lquerel Jan 7, 2023
9e38c47
Add a justification behind the compression field in the protobuf mess…
lquerel Jan 12, 2023
44b2743
Fix markdown-lint issue
lquerel Jan 12, 2023
198f114
Add link to ZSTD dictionary optimization.
lquerel Jan 12, 2023
778cf0a
Fix markdown-lint issue again...
lquerel Jan 12, 2023
e520b03
Add paragraph "unary RPC vs stream RPC"
lquerel Jan 12, 2023
3952c52
Improve Zero-copy argumentation
lquerel Jan 12, 2023
21cd665
Improve the argument around `otlp_arrow_payloads` which is a repeated…
lquerel Jan 13, 2023
ef2c09f
Improve RecordBatch description.
lquerel Jan 13, 2023
ed78a51
Improve paragraph "Dense vs Sparse union"
lquerel Jan 13, 2023
1524d84
Apply Joshua MacDonald's updates (thanks @jmacd)
lquerel Jan 13, 2023
f0dc753
Fix markdown lint issues.
lquerel Jan 13, 2023
8e65c10
Remove delivery_type and dictionaries attributes.
lquerel Jan 14, 2023
3f73d39
Specify compression algo in Validation section
lquerel Jan 17, 2023
c78c78e
Remove compression field from OtlpArrowPayload
lquerel Jan 18, 2023
20b4e30
Fix broken link (img)
lquerel Jan 18, 2023
c5c747c
Explain gains bandwidth+speed
lquerel Jan 21, 2023
87b5352
Fix markdown issue
lquerel Jan 21, 2023
d648aba
Fix markdown issue
lquerel Jan 21, 2023
942c98e
Fix markdown issue
lquerel Jan 21, 2023
d5e6a89
Fix terminology
lquerel Jan 23, 2023
4a10df3
Update charts with last ref. impl.
lquerel Feb 18, 2023
f0dd638
Fix charts with last ref. impl. + update description
lquerel Feb 18, 2023
1f0e9fc
Fix charts with last ref. impl. + update description
lquerel Feb 18, 2023
4454d02
Fix charts with last ref. impl. + update description
lquerel Feb 18, 2023
efeb9e9
Fix markdown lints
lquerel Feb 18, 2023
eade6b5
Merge branch 'main' into main
lquerel Apr 26, 2023
e6f7dc6
rename OTLP Arrow to OTel Arrow
lquerel May 30, 2023
1fde108
update proto file
lquerel May 30, 2023
8a2b713
update multivariate paragraph
lquerel May 30, 2023
686467c
update compression ratio summary
lquerel May 30, 2023
e08d43d
update grpc service/protocol section
lquerel May 30, 2023
110990b
update protobuf
lquerel May 30, 2023
b9e8452
Add ER diagrams for metrics, logs, and traces
lquerel May 30, 2023
6786b8c
Update `Logs Payload` section
lquerel May 30, 2023
df28655
Update `Spans Arrow Mapping` section
lquerel May 30, 2023
e73155d
Update `Metrics Arrow Mapping` section
lquerel May 30, 2023
84dac53
Update `Mapping OTel Entities to Arrow Records` section
lquerel May 30, 2023
1b78ba6
Update 3-columns chart benchmark.
lquerel May 31, 2023
8f82586
Update 3-columns stacked bar benchmark.
lquerel Jun 1, 2023
72fbeba
Update collector diagrams.
lquerel Jun 1, 2023
fb17f4b
Simplify benchmark section.
lquerel Jun 1, 2023
2aa2047
update benchmark section.
lquerel Jun 1, 2023
21bd7cb
fix markdown issues
lquerel Jun 1, 2023
621475f
fix markdown issues
lquerel Jun 1, 2023
4bed578
Merge branch 'main' into main
lquerel Jun 1, 2023
09749cc
Update text/0156-columnar-encoding.md
lquerel Jun 12, 2023
549061c
Update text/0156-columnar-encoding.md
lquerel Jun 12, 2023
64f4a1d
Update text/0156-columnar-encoding.md
lquerel Jun 12, 2023
ea6fd96
Update text/0156-columnar-encoding.md
lquerel Jun 12, 2023
ec77368
Update OTEP based on Tigran's comments.
lquerel Jun 12, 2023
6c2e9b5
Merge branch 'main' of https://github.com/lquerel/oteps
lquerel Jun 12, 2023
ce274de
Fix markdown lint issue
lquerel Jun 12, 2023
df06cea
Update OTEP based on Tigran's comments
lquerel Jun 12, 2023
f83a378
Change batch_id type from string to int64
lquerel Jun 20, 2023
9bdeab2
Rename sub_stream_id to schema_id (see https://github.com/open-teleme…
lquerel Jun 20, 2023
d5e8966
Create a new section Future Possibilities (see https://github.com/ope…
lquerel Jun 20, 2023
fae225a
Fix typo
lquerel Jun 20, 2023
1892d31
Minor change (request -> service)
lquerel Jun 20, 2023
5512264
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
560ca64
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
9f68a84
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
34fd4df
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
25bf657
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
7fbd269
Update based on @atoulme comments
lquerel Jun 26, 2023
0c3742d
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
0ffddbf
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
e5c64e9
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
0f1faf1
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
f26c725
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
ad06693
Update text/0156-columnar-encoding.md
lquerel Jun 26, 2023
a924a99
Add delta-delta encoding link
lquerel Jun 26, 2023
5bd1d00
Merge branch 'main' into main
reyang Jun 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update multivariate paragraph
  • Loading branch information
lquerel committed May 30, 2023
commit 8a2b7135b9289051908055cc1e9fbfbbf154785a
30 changes: 15 additions & 15 deletions text/0156-columnar-encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,9 @@ OpenTelemetry protocol this compatible extension has the following improvements:
telemetry data based on a columnar representation, 2) a stream-oriented gRPC endpoint that is more efficient to
lquerel marked this conversation as resolved.
Show resolved Hide resolved
transmit batches of OTLP entities.
* **Provide a more optimal representation for multivariate time-series data**.
With the current version of the OpenTelemetry protocol, users have to transform multivariate time-series (i.e multiple
related metrics sharing the same attributes and timestamp) into a collection of univariate time-series resulting in a
large amount of duplication and additional overhead covering the entire chain from exporters to backends.
Multivariate time-series are currently not well compressed in the existing protocol (multivariate = related metrics
sharing the same attributes and timestamp). The OTel Arrow protocol provides a much better compression rate for this
type of data by leveraging the columnar representation.
* **Provide more advanced and efficient telemetry data processing capabilities**. Increasing data volume, cost
efficiency, and data minimization require additional data processing capabilities such as data projection,
aggregation, and filtering.
Expand All @@ -103,7 +103,7 @@ basis for columnar support in OTLP.

### Validation

A series of tests were conducted to compare compression ratios between OTLP and a columnar version of OTLP called OTLP
A series of tests were conducted to compare compression ratios between OTLP and a columnar version of OTLP called OTel
Arrow. The key results are:
lquerel marked this conversation as resolved.
Show resolved Hide resolved

* For univariate time series, OTel Arrow is **2 to 3.5 better in terms of bandwidth reduction while having an
Expand Down Expand Up @@ -191,7 +191,7 @@ A two-phase integration is proposed to allow incremental benefits.
#### Phase 1

This proposal is designed as a protocol extension compatible with the existing OTLP protocol. As illustrated in the
following diagram, a new OTLP-Arrow to OTLP receiver will be responsible for translating the protocol extension to the
following diagram, a new OTel Arrow to OTLP receiver will be responsible for translating the protocol extension to the
existing protocol. Similarly, a new exporter will be responsible for translating the OTLP messages into this new Arrow-based
format.

Expand Down Expand Up @@ -327,7 +327,7 @@ More details on the `OtlpArrowPayload` columns in the section [Mapping OTel enti
More specifically, an `OtlpArrowPayload` protobuf message is defined as:

```protobuf
// Enumeration of all the OTLP Arrow payload types currently supported by the OTLP Arrow protocol.
// Enumeration of all the OTLP Arrow payload types currently supported by the OTel Arrow protocol.
enum OtlpArrowPayloadType {
UNKNOWN = 0;

Expand Down Expand Up @@ -366,12 +366,12 @@ enum OtlpArrowPayloadType {
SPAN_LINK_ATTRS = 45;
}

// Represents a batch of OTLP Arrow entities.
// Represents a batch of OTel Arrow entities.
message OtlpArrowPayload {
// [mandatory] A unique id assigned to a sub-stream of the batch sharing the same schema, and dictionaries.
string sub_stream_id = 1;
lquerel marked this conversation as resolved.
Show resolved Hide resolved

// [mandatory] Type of the OTLP Arrow payload.
// [mandatory] Type of the OTel Arrow payload.
OtlpArrowPayloadType type = 2;

// [mandatory] Serialized Arrow Record Batch
Expand Down Expand Up @@ -915,18 +915,18 @@ For the prototype specifically, which is a fork of the OpenTelemetry
collector codebase, we have derived the OTLP/gRPC-Arrow exporter and
receiver as set of changes directly to the `receiver/otlpreceiver` and
`exporter/otlpexporter` components, with new `internal/arrow` packages
in both. With every collector release we merge the OTLP-Arrow changes
in both. With every collector release we merge the OTel Arrow changes
with the mainline components to maintain this promise of
compatibility.

OTLP-Arrow supports conveying the gRPC metadta (i.e., http2 headers) using a dedicated `bytes` field. Metadata is
OTel Arrow supports conveying the gRPC metadta (i.e., http2 headers) using a dedicated `bytes` field. Metadata is
encoded using [hpack](https://datatracker.ietf.org/doc/rfc7541/) like a typical unary gRPC request.

Specifically:

#### OTLP/gRPC Receiver

When Arrow is enabled, the OTLP receiver listens for both the standard unary gRPC service OTLP and OTLP-Arrow stream
When Arrow is enabled, the OTLP receiver listens for both the standard unary gRPC service OTLP and OTel Arrow stream
lquerel marked this conversation as resolved.
Show resolved Hide resolved
services. Each stream uses an instance of the OTel-Arrow-Adapter's
[Consumer](https://pkg.go.dev/github.com/f5/otel-arrow-adapter@v0.0.0-20230112224802-dafb6df21c97/pkg/otel/arrow_record#Consumer). Sets
`client.Metadata` in the Context.
Expand Down Expand Up @@ -1132,9 +1132,9 @@ The columnar representation is more efficient for transporting large homogeneous
combining automatically column-oriented and row-oriented batches would allow to cover all scenarios. The development of
a strategy to automatically select the best data representation mode is an open question.

### Unary gRPC OTLP-Arrow and HTTP OTLP-Arrow
### Unary gRPC OTel Arrow and HTTP OTel Arrow

The design currently calls for the use of gRPC streams to benefit from OTLP-Arrow transport. We believe that some of
The design currently calls for the use of gRPC streams to benefit from OTel Arrow transport. We believe that some of
this benefit can be had even for unary gRPC and HTTP requests with large request batches to amortize sending of
dictionary and schema information. This remains an area for study.

Expand Down Expand Up @@ -1420,7 +1420,7 @@ The batch creation+sorting phase for OTel Arrow represents almost all the time s
times are
almost zero due to the use of Flatbuffer by Apache Arrow (ser/deser without parsing/unpacking). Compression and
decompression
times are low due to the fact that the size of the Arrow OTLP message before compression is more than 7 times
times are low due to the fact that the size of the OTel Arrow message before compression is more than 7 times
smaller than an OTLP message with the same content.

> It should be possible to significantly optimize the creation of OTel Arrow batches for contexts where the structure
Expand All @@ -1437,7 +1437,7 @@ details on a solution to facilitate data sharing.

## Appendix C - Parameter Tuning and Design Optimization

This section describes the systematic approach used to optimize the Arrow OTLP design and its parameters. The approach
This section describes the systematic approach used to optimize the OTel Arrow design and its parameters. The approach
is
based on an optimization technique called "blackbox optimization" which allows to describe the system to be optimized as
a black box with a set of input parameters and an output corresponding to the value of a function to be optimized
Expand Down