From e85bbc57de52cc9044bc0911de0d1afe5d1fbe07 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri, 30 Aug 2024 18:00:30 -0400 Subject: [PATCH] [release-v2.6] [DOC] Add vParquet4 to analyse blocks CLI and update parquet doc (#4040) Co-authored-by: Joe Elliott Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> --- docs/sources/tempo/configuration/parquet.md | 33 ++++++++++----------- docs/sources/tempo/operations/tempo_cli.md | 8 +++-- 2 files changed, 22 insertions(+), 19 deletions(-) diff --git a/docs/sources/tempo/configuration/parquet.md b/docs/sources/tempo/configuration/parquet.md index 01b023573b4..f83ae0e32b0 100644 --- a/docs/sources/tempo/configuration/parquet.md +++ b/docs/sources/tempo/configuration/parquet.md @@ -7,43 +7,48 @@ weight: 300 # Apache Parquet block format - -Tempo has a default columnar block format based on Apache Parquet. This format is required for tags-based search as well as [TraceQL]({{< relref "../traceql" >}}), the query language for traces. The columnar block format improves search performance and enables a large ecosystem of tools to access the underlying trace data. +Tempo has a default columnar block format based on Apache Parquet. +This format is required for tags-based search as well as [TraceQL]({{< relref "../traceql" >}}), the query language for traces. +The columnar block format improves search performance and enables an ecosystem of tools, including [Tempo CLI](https://grafana.com/docs/tempo//operations/tempo_cli/#analyse-blocks), to access the underlying trace data. For more information, refer to the [Parquet design document](https://github.com/grafana/tempo/blob/main/docs/design-proposals/2022-04%20Parquet.md) and [Issue 1480](https://github.com/grafana/tempo/issues/1480). Additionally, there is now a [Parquet v3 design document](https://github.com/grafana/tempo/blob/main/docs/design-proposals/2023-05%20vParquet3.md). -If you install using the new Helm charts, then Parquet is enabled by default. - ## Considerations -The Parquet block format is enabled by default since Tempo 2.0. No data conversion or upgrade process is necessary. As soon as the format is enabled, Tempo starts writing data in that format, leaving existing data as-is. +The Parquet block format is enabled by default since Tempo 2.0. + +If you install using the [Tempo Helm charts](https://grafana.com/docs/tempo//setup/helm-chart/), then Parquet is enabled by default. +No data conversion or upgrade process is necessary. +As soon as a block format is enabled, Tempo starts writing data in that format, leaving existing data as-is. Block formats based on Parquet require more CPU and memory resources than the previous `v2` format but provide search and TraceQL functionality. ## Choose a different block format The default block format is `vParquet4`, which is the latest iteration of the Parquet-based columnar block format in Tempo. -It introduces dedicated attribute columns, which improve query performance by storing attributes in own columns, -rather than in the generic attribute key-value list. -For more information, see [Dedicated attribute columns]({{< relref "../operations/tempo_cli" >}}). +vParquet4 introduces new columns which enable querying for data in array attributes as well as events and links. +For more information, refer to [Dedicated attribute columns](https://grafana.com/docs/tempo//operations/dedicated_columns/). + You can still use the previous format `vParquet3`. -To enable it, set the block version option to `vParquet3` in the Storage section of the configuration file. +To enable it, set the block version option to `vParquet3` in the [Storage section](https://grafana.com/docs/tempo//configuration/#storage) of the configuration file. ```yaml # block format version. options: v2, vParquet2, vParquet3, vParquet4 [version: vParquet4] ``` -In some cases, you may choose to disable Parquet and use the old `v2` block format. Using the `v2` block format disables all forms of search, but also reduces resource consumption, and may be desired for a high-throughput cluster that does not need these capabilities. To make this change, set the block version option to `v2` in the Storage section of the configuration file. +In some cases, you may choose to disable Parquet and use the old `v2` block format. +Using the `v2` block format disables all forms of search, but also reduces resource consumption, and may be desired for a high-throughput cluster that doesn't need these capabilities. +To make this change, set the block version option to `v2` in the Storage section of the configuration file. ```yaml # block format version. options: v2, vParquet2, vParquet3, vParquet4 [version: v2] ``` -To re-enable the default `vParquet3` format, remove the block version option from the Storage section of the configuration file or set the option to `vParquet3`. +To re-enable the default `vParquet4` format, remove the block version option from the [Storage section](https://grafana.com/docs/tempo//configuration/#storage) of the configuration file or set the option to `vParquet4`. ## Parquet configuration parameters @@ -68,9 +73,3 @@ The `cache_control` section contains the follow parameters for Parquet metadata | [footer: \| default = false] | `false` | Specifies if the footer should be cached | | `[column_index: \| default = false]` | `false` | Specifies if the column index should be cached | | `[offset_index: \| default = false]` | `false` | Specifies if the offset index should be cached | - -## Convert to Parquet - -If you have used an earlier version of the Parquet format, you can use `tempo-cli` to convert a Parquet file from its existing schema to the one used in Tempo 2.0. - -For instructions, refer to the [Parquet convert command documentation]({{< relref "../operations/tempo_cli#parquet-convert-command" >}}). diff --git a/docs/sources/tempo/operations/tempo_cli.md b/docs/sources/tempo/operations/tempo_cli.md index b94268fb9b7..7b1087a224b 100644 --- a/docs/sources/tempo/operations/tempo_cli.md +++ b/docs/sources/tempo/operations/tempo_cli.md @@ -8,7 +8,7 @@ weight: 70 # Tempo CLI Tempo CLI is a separate executable that contains utility functions related to the Tempo software. -Although it is not required for a working installation, Tempo CLI can be helpful for deeper analysis or for troubleshooting. +Although it's not required for a working installation, Tempo CLI can be helpful for deeper analysis or for troubleshooting. ## Tempo CLI command syntax @@ -466,6 +466,8 @@ tempo-cli migrate overrides-config config.yaml --config-dest config-tmp.yaml --o ``` ## Analyse block + + Analyses a block and outputs a summary of the block's generic attributes. It's of particular use when trying to determine candidates for dedicated attribute columns in vParquet3. @@ -483,8 +485,9 @@ tempo-cli analyse block --backend=local --bucket=./cmd/tempo-cli/test-data/ sing ``` ## Analyse blocks + Analyses all blocks in a given time range and outputs a summary of the blocks' generic attributes. -It's of particular use when trying to determine candidates for dedicated attribute columns in vParquet3. +It's of particular use when trying to determine candidates for dedicated attribute columns in vParquet3 and vParquet4. Arguments: - `tenant-id` The tenant ID. Use `single-tenant` for single-tenant setups. @@ -503,6 +506,7 @@ tempo-cli analyse blocks --backend=local --bucket=./cmd/tempo-cli/test-data/ sin ``` ## Drop trace by id + Rewrites all blocks for a tenant that contain a specific trace id. The trace is dropped from the new blocks and the rewritten blocks are marked compacted so they will be cleaned up.