From ecabf896bb03480fe77ce90aa855de80173ebbec Mon Sep 17 00:00:00 2001 From: shuiyisong <113876041+shuiyisong@users.noreply.github.com> Date: Thu, 18 Jul 2024 20:18:56 +0800 Subject: [PATCH] chore: update logs data format (#1066) Co-authored-by: Yiran --- .../en/user-guide/logs/pipeline-config.md | 26 +++--- .../nightly/en/user-guide/logs/quick-start.md | 6 +- docs/nightly/en/user-guide/logs/write-logs.md | 81 +++++++++++++++++- .../zh/user-guide/logs/pipeline-config.md | 24 +++--- .../nightly/zh/user-guide/logs/quick-start.md | 6 +- docs/nightly/zh/user-guide/logs/write-logs.md | 82 ++++++++++++++++++- .../en/user-guide/logs/pipeline-config.md | 26 +++--- docs/v0.9/en/user-guide/logs/quick-start.md | 6 +- docs/v0.9/en/user-guide/logs/write-logs.md | 81 +++++++++++++++++- .../zh/user-guide/logs/pipeline-config.md | 24 +++--- docs/v0.9/zh/user-guide/logs/quick-start.md | 6 +- docs/v0.9/zh/user-guide/logs/write-logs.md | 82 ++++++++++++++++++- 12 files changed, 376 insertions(+), 74 deletions(-) diff --git a/docs/nightly/en/user-guide/logs/pipeline-config.md b/docs/nightly/en/user-guide/logs/pipeline-config.md index 63d74e82e..e5924e2d3 100644 --- a/docs/nightly/en/user-guide/logs/pipeline-config.md +++ b/docs/nightly/en/user-guide/logs/pipeline-config.md @@ -1,15 +1,15 @@ # Pipeline Configuration -Pipeline is a mechanism in GreptimeDB for transforming log data. It consists of a unique name and a set of configuration rules that define how log data is formatted, split, and transformed. Currently, we support JSON (`application/json`) and plain text (`text/plain`) formats as input for log data. +Pipeline is a mechanism in GreptimeDB for parsing and transforming log data. It consists of a unique name and a set of configuration rules that define how log data is formatted, split, and transformed. Currently, we support JSON (`application/json`) and plain text (`text/plain`) formats as input for log data. These configurations are provided in YAML format, allowing the Pipeline to process data during the log writing process according to the defined rules and store the processed data in the database for subsequent structured queries. -## The overall structure +## Overall structure Pipeline consists of two parts: Processors and Transform, both of which are in array format. A Pipeline configuration can contain multiple Processors and multiple Transforms. The data type described by Transform determines the table structure when storing log data in the database. - Processors are used for preprocessing log data, such as parsing time fields and replacing fields. -- Transform is used for converting log data formats, such as converting string types to numeric types. +- Transform is used for converting data formats, such as converting string types to numeric types. Here is an example of a simple configuration that includes Processors and Transform: @@ -40,15 +40,15 @@ The Processor is used for preprocessing log data, and its configuration is locat We currently provide the following built-in Processors: -- `date`: Used to parse formatted time string fields, such as `2024-07-12T16:18:53.048`. -- `epoch`: Used to parse numeric timestamp fields, such as `1720772378893`. -- `dissect`: Used to split log data fields. -- `gsub`: Used to replace log data fields. -- `join`: Used to merge array-type fields in logs. -- `letter`: Used to convert log data fields to letters. -- `regex`: Used to perform regular expression matching on log data fields. -- `urlencoding`: Used to perform URL encoding/decoding on log data fields. -- `csv`: Used to parse CSV data fields in logs. +- `date`: parses formatted time string fields, such as `2024-07-12T16:18:53.048`. +- `epoch`: parses numeric timestamp fields, such as `1720772378893`. +- `dissect`: splits log data fields. +- `gsub`: replaces log data fields. +- `join`: merges array-type fields in logs. +- `letter`: converts log data fields to letters. +- `regex`: performs regular expression matching on log data fields. +- `urlencoding`: performs URL encoding/decoding on log data fields. +- `csv`: parses CSV data fields in logs. ### `date` @@ -68,7 +68,7 @@ processors: In the above example, the configuration of the `date` processor includes the following fields: - `fields`: A list of time field names to be parsed. -- `formats`: Time format strings, supporting multiple format strings. Parsing is attempted in the order provided until successful. +- `formats`: Time format strings, supporting multiple format strings. Parsing is attempted in the order provided until successful. You can find reference [here](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) for formatting syntax. - `ignore_missing`: Ignores the case when the field is missing. Defaults to `false`. If the field is missing and this configuration is set to `false`, an exception will be thrown. - `timezone`: Time zone. Use the time zone identifiers from the [tz_database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) to specify the time zone. Defaults to `UTC`. diff --git a/docs/nightly/en/user-guide/logs/quick-start.md b/docs/nightly/en/user-guide/logs/quick-start.md index 224edf860..44bad2503 100644 --- a/docs/nightly/en/user-guide/logs/quick-start.md +++ b/docs/nightly/en/user-guide/logs/quick-start.md @@ -101,7 +101,7 @@ curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "fi After successfully executing this command, a pipeline named `nginx_pipeline` will be created, and the result will be returned as: -```shell +```json {"name":"nginx_pipeline","version":"2024-06-27 12:02:34.257312110Z"}. ``` @@ -126,7 +126,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=pipeline_lo You will see the following output if the command is successful: -```shell +```json {"output":[{"affectedrows":4}],"execution_time_ms":79} ``` @@ -182,7 +182,7 @@ Of course, if you need keyword searching within large text blocks, you must use ## Query logs -The `pipeline_logs` as the example to query logs. +We use the `pipeline_logs` table as an example to query logs. ### Query logs by tags diff --git a/docs/nightly/en/user-guide/logs/write-logs.md b/docs/nightly/en/user-guide/logs/write-logs.md index 1398495ea..331d785fe 100644 --- a/docs/nightly/en/user-guide/logs/write-logs.md +++ b/docs/nightly/en/user-guide/logs/write-logs.md @@ -14,7 +14,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=&table=" ``` -## Query parameters +## Request parameters This interface accepts the following parameters: @@ -23,9 +23,84 @@ This interface accepts the following parameters: - `pipeline_name`: The name of the [pipeline](./pipeline-config.md). - `version`: The version of the pipeline. Optional, default use the latest one. -## Body data format +## `Content-Type` and body format -The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry. +GreptimeDB uses `Content-Type` header to decide how to decode the payload body. Currently the following two format is supported: +- `application/json`: this includes normal JSON format and NDJSON format. +- `text/plain`: multiple log lines separated by line breaks. + +### `application/json` format + +Here is an example of JSON format body payload + +```JSON +[ + {"message":"127.0.0.1 - - [25/May/2024:20:16:37 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\""}, + {"message":"192.168.1.1 - - [25/May/2024:20:17:37 +0000] \"POST /api/login HTTP/1.1\" 200 1784 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36\""}, + {"message":"10.0.0.1 - - [25/May/2024:20:18:37 +0000] \"GET /images/logo.png HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0\""}, + {"message":"172.16.0.1 - - [25/May/2024:20:19:37 +0000] \"GET /contact HTTP/1.1\" 404 162 \"-\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1\""} +] +``` + +Note the whole JSON is an array (log lines). Each JSON object represents one line to be processed by Pipeline engine. + +The name of the key in JSON objects, which is `message` here, is used as field name in Pipeline processors. For example: + +```yaml +processors: + - dissect: + fields: + # `message` is the key in JSON object + - message + patterns: + - '%{ip_address} - - [%{timestamp}] "%{http_method} %{request_line}" %{status_code} %{response_size} "-" "%{user_agent}"' + ignore_missing: true + +# rest of the file is ignored +``` + +We can also rewrite the payload into NDJSON format like following: + +```JSON +{"message":"127.0.0.1 - - [25/May/2024:20:16:37 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\""} +{"message":"192.168.1.1 - - [25/May/2024:20:17:37 +0000] \"POST /api/login HTTP/1.1\" 200 1784 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36\""} +{"message":"10.0.0.1 - - [25/May/2024:20:18:37 +0000] \"GET /images/logo.png HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0\""} +{"message":"172.16.0.1 - - [25/May/2024:20:19:37 +0000] \"GET /contact HTTP/1.1\" 404 162 \"-\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1\""} +``` + +Note the outer array is eliminated, and lines are separated by line breaks instead of `,`. + +### `text/plain` format + +Log in plain text format is widely used throughout the ecosystem. GreptimeDB also supports `text/plain` format as log data input, enabling ingesting logs first hand from log producers. + +The equivalent body payload of previous example is like following: + +```plain +127.0.0.1 - - [25/May/2024:20:16:37 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" +192.168.1.1 - - [25/May/2024:20:17:37 +0000] "POST /api/login HTTP/1.1" 200 1784 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36" +10.0.0.1 - - [25/May/2024:20:18:37 +0000] "GET /images/logo.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0" +172.16.0.1 - - [25/May/2024:20:19:37 +0000] "GET /contact HTTP/1.1" 404 162 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1" +``` + +Sending log ingestion request to GreptimeDB requires only modifying the `Content-Type` header to be `text/plain`, and you are good to go! + +Please note that, unlike JSON format, where the input data already have key names as field names to be used in Pipeline processors, `text/plain` format just gives the whole line as input to the Pipeline engine. In this case we use `line` as the field name to refer to the input line, for example: + +```yaml +processors: + - dissect: + fields: + # use `line` as the field name + - line + patterns: + - '%{ip_address} - - [%{timestamp}] "%{http_method} %{request_line}" %{status_code} %{response_size} "-" "%{user_agent}"' + ignore_missing: true + +# rest of the file is ignored +``` + +It is recommended to use `dissect` or `regex` processor to split the input line into fields first and then process the fields accordingly. ## Example diff --git a/docs/nightly/zh/user-guide/logs/pipeline-config.md b/docs/nightly/zh/user-guide/logs/pipeline-config.md index 0e21aa582..b4cba04ba 100644 --- a/docs/nightly/zh/user-guide/logs/pipeline-config.md +++ b/docs/nightly/zh/user-guide/logs/pipeline-config.md @@ -1,6 +1,6 @@ # Pipeline 配置 -Pipeline 是 GreptimeDB 中对 log 数据进行转换的一种机制, 由一个唯一的名称和一组配置规则组成,这些规则定义了如何对日志数据进行格式化、拆分和转换。目前我们支持 JSON(`application/json`)和纯文本(`text/plain`)格式的日志数据作为输入。 +Pipeline 是 GreptimeDB 中对 log 数据进行解析和转换的一种机制, 由一个唯一的名称和一组配置规则组成,这些规则定义了如何对日志数据进行格式化、拆分和转换。目前我们支持 JSON(`application/json`)和纯文本(`text/plain`)格式的日志数据作为输入。 这些配置以 YAML 格式提供,使得 Pipeline 能够在日志写入过程中,根据设定的规则对数据进行处理,并将处理后的数据存储到数据库中,便于后续的结构化查询。 @@ -9,7 +9,7 @@ Pipeline 是 GreptimeDB 中对 log 数据进行转换的一种机制, 由一 Pipeline 由两部分组成:Processors 和 Transform,这两部分均为数组形式。一个 Pipeline 配置可以包含多个 Processor 和多个 Transform。Transform 所描述的数据类型会决定日志数据保存到数据库时的表结构。 - Processor 用于对 log 数据进行预处理,例如解析时间字段,替换字段等。 -- Transform 用于对 log 数据进行格式转换,例如将字符串类型转换为数字类型。 +- Transform 用于对数据进行格式转换,例如将字符串类型转换为数字类型。 一个包含 Processor 和 Transform 的简单配置示例如下: @@ -42,15 +42,15 @@ Processor 由一个 name 和多个配置组成,不同类型的 Processor 配 我们目前内置了以下几种 Processor: -- `date`: 用于解析格式化的时间字符串字段,例如 `2024-07-12T16:18:53.048`。 -- `epoch`: 用于解析数字时间戳字段,例如 `1720772378893`。 -- `dissect`: 用于对 log 数据字段进行拆分。 -- `gsub`: 用于对 log 数据字段进行替换。 -- `join`: 用于对 log 中的 array 类型字段进行合并。 -- `letter`: 用于对 log 数据字段进行字母转换。 -- `regex`: 用于对 log 数据字段进行正则匹配。 -- `urlencoding`: 用于对 log 数据字段进行 URL 编解码。 -- `csv`: 用于对 log 数据字段进行 CSV 解析。 +- `date`: 解析格式化的时间字符串字段,例如 `2024-07-12T16:18:53.048`。 +- `epoch`: 解析数字时间戳字段,例如 `1720772378893`。 +- `dissect`: 对 log 数据字段进行拆分。 +- `gsub`: 对 log 数据字段进行替换。 +- `join`: 对 log 中的 array 类型字段进行合并。 +- `letter`: 对 log 数据字段进行字母转换。 +- `regex`: 对 log 数据字段进行正则匹配。 +- `urlencoding`: 对 log 数据字段进行 URL 编解码。 +- `csv`: 对 log 数据字段进行 CSV 解析。 ### `date` @@ -70,7 +70,7 @@ processors: 如上所示,`date` Processor 的配置包含以下字段: - `fields`: 需要解析的时间字段名列表。 -- `formats`: 时间格式化字符串,支持多个时间格式化字符串。按照提供的顺序尝试解析,直到解析成功。 +- `formats`: 时间格式化字符串,支持多个时间格式化字符串。按照提供的顺序尝试解析,直到解析成功。你可以在[这里](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)找到格式化的语法说明。 - `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 - `timezone`: 时区。使用[tz_database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) 中的时区标识符来指定时区。默认为 `UTC`。 diff --git a/docs/nightly/zh/user-guide/logs/quick-start.md b/docs/nightly/zh/user-guide/logs/quick-start.md index 185bd4c86..1871e6e7c 100644 --- a/docs/nightly/zh/user-guide/logs/quick-start.md +++ b/docs/nightly/zh/user-guide/logs/quick-start.md @@ -99,7 +99,7 @@ curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "fi 成功执行此命令后,将创建一个名为 `nginx_pipeline` 的 pipeline,返回的结果如下: -```shell +```json {"name":"nginx_pipeline","version":"2024-06-27 12:02:34.257312110Z"}. ``` @@ -124,7 +124,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=pipeline_lo 如果命令执行成功,您将看到以下输出: -```shell +```json {"output":[{"affectedrows":4}],"execution_time_ms":79} ``` @@ -179,7 +179,7 @@ DESC pipeline_logs; ## 查询日志 -以 `pipeline_logs` 为例查询日志。 +以 `pipeline_logs` 表为例查询日志。 ### 按 Tag 查询日志 diff --git a/docs/nightly/zh/user-guide/logs/write-logs.md b/docs/nightly/zh/user-guide/logs/write-logs.md index add90491f..1da9a8851 100644 --- a/docs/nightly/zh/user-guide/logs/write-logs.md +++ b/docs/nightly/zh/user-guide/logs/write-logs.md @@ -15,7 +15,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=&table=&table=&table=" ``` -## Query parameters +## Request parameters This interface accepts the following parameters: @@ -23,9 +23,84 @@ This interface accepts the following parameters: - `pipeline_name`: The name of the [pipeline](./pipeline-config.md). - `version`: The version of the pipeline. Optional, default use the latest one. -## Body data format +## `Content-Type` and body format -The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry. +GreptimeDB uses `Content-Type` header to decide how to decode the payload body. Currently the following two format is supported: +- `application/json`: this includes normal JSON format and NDJSON format. +- `text/plain`: multiple log lines separated by line breaks. + +### `application/json` format + +Here is an example of JSON format body payload + +```JSON +[ + {"message":"127.0.0.1 - - [25/May/2024:20:16:37 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\""}, + {"message":"192.168.1.1 - - [25/May/2024:20:17:37 +0000] \"POST /api/login HTTP/1.1\" 200 1784 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36\""}, + {"message":"10.0.0.1 - - [25/May/2024:20:18:37 +0000] \"GET /images/logo.png HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0\""}, + {"message":"172.16.0.1 - - [25/May/2024:20:19:37 +0000] \"GET /contact HTTP/1.1\" 404 162 \"-\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1\""} +] +``` + +Note the whole JSON is an array (log lines). Each JSON object represents one line to be processed by Pipeline engine. + +The name of the key in JSON objects, which is `message` here, is used as field name in Pipeline processors. For example: + +```yaml +processors: + - dissect: + fields: + # `message` is the key in JSON object + - message + patterns: + - '%{ip_address} - - [%{timestamp}] "%{http_method} %{request_line}" %{status_code} %{response_size} "-" "%{user_agent}"' + ignore_missing: true + +# rest of the file is ignored +``` + +We can also rewrite the payload into NDJSON format like following: + +```JSON +{"message":"127.0.0.1 - - [25/May/2024:20:16:37 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36\""} +{"message":"192.168.1.1 - - [25/May/2024:20:17:37 +0000] \"POST /api/login HTTP/1.1\" 200 1784 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36\""} +{"message":"10.0.0.1 - - [25/May/2024:20:18:37 +0000] \"GET /images/logo.png HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0\""} +{"message":"172.16.0.1 - - [25/May/2024:20:19:37 +0000] \"GET /contact HTTP/1.1\" 404 162 \"-\" \"Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1\""} +``` + +Note the outer array is eliminated, and lines are separated by line breaks instead of `,`. + +### `text/plain` format + +Log in plain text format is widely used throughout the ecosystem. GreptimeDB also supports `text/plain` format as log data input, enabling ingesting logs first hand from log producers. + +The equivalent body payload of previous example is like following: + +```plain +127.0.0.1 - - [25/May/2024:20:16:37 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" +192.168.1.1 - - [25/May/2024:20:17:37 +0000] "POST /api/login HTTP/1.1" 200 1784 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36" +10.0.0.1 - - [25/May/2024:20:18:37 +0000] "GET /images/logo.png HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0" +172.16.0.1 - - [25/May/2024:20:19:37 +0000] "GET /contact HTTP/1.1" 404 162 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1" +``` + +Sending log ingestion request to GreptimeDB requires only modifying the `Content-Type` header to be `text/plain`, and you are good to go! + +Please note that, unlike JSON format, where the input data already have key names as field names to be used in Pipeline processors, `text/plain` format just gives the whole line as input to the Pipeline engine. In this case we use `line` as the field name to refer to the input line, for example: + +```yaml +processors: + - dissect: + fields: + # use `line` as the field name + - line + patterns: + - '%{ip_address} - - [%{timestamp}] "%{http_method} %{request_line}" %{status_code} %{response_size} "-" "%{user_agent}"' + ignore_missing: true + +# rest of the file is ignored +``` + +It is recommended to use `dissect` or `regex` processor to split the input line into fields first and then process the fields accordingly. ## Example diff --git a/docs/v0.9/zh/user-guide/logs/pipeline-config.md b/docs/v0.9/zh/user-guide/logs/pipeline-config.md index 0e21aa582..b4cba04ba 100644 --- a/docs/v0.9/zh/user-guide/logs/pipeline-config.md +++ b/docs/v0.9/zh/user-guide/logs/pipeline-config.md @@ -1,6 +1,6 @@ # Pipeline 配置 -Pipeline 是 GreptimeDB 中对 log 数据进行转换的一种机制, 由一个唯一的名称和一组配置规则组成,这些规则定义了如何对日志数据进行格式化、拆分和转换。目前我们支持 JSON(`application/json`)和纯文本(`text/plain`)格式的日志数据作为输入。 +Pipeline 是 GreptimeDB 中对 log 数据进行解析和转换的一种机制, 由一个唯一的名称和一组配置规则组成,这些规则定义了如何对日志数据进行格式化、拆分和转换。目前我们支持 JSON(`application/json`)和纯文本(`text/plain`)格式的日志数据作为输入。 这些配置以 YAML 格式提供,使得 Pipeline 能够在日志写入过程中,根据设定的规则对数据进行处理,并将处理后的数据存储到数据库中,便于后续的结构化查询。 @@ -9,7 +9,7 @@ Pipeline 是 GreptimeDB 中对 log 数据进行转换的一种机制, 由一 Pipeline 由两部分组成:Processors 和 Transform,这两部分均为数组形式。一个 Pipeline 配置可以包含多个 Processor 和多个 Transform。Transform 所描述的数据类型会决定日志数据保存到数据库时的表结构。 - Processor 用于对 log 数据进行预处理,例如解析时间字段,替换字段等。 -- Transform 用于对 log 数据进行格式转换,例如将字符串类型转换为数字类型。 +- Transform 用于对数据进行格式转换,例如将字符串类型转换为数字类型。 一个包含 Processor 和 Transform 的简单配置示例如下: @@ -42,15 +42,15 @@ Processor 由一个 name 和多个配置组成,不同类型的 Processor 配 我们目前内置了以下几种 Processor: -- `date`: 用于解析格式化的时间字符串字段,例如 `2024-07-12T16:18:53.048`。 -- `epoch`: 用于解析数字时间戳字段,例如 `1720772378893`。 -- `dissect`: 用于对 log 数据字段进行拆分。 -- `gsub`: 用于对 log 数据字段进行替换。 -- `join`: 用于对 log 中的 array 类型字段进行合并。 -- `letter`: 用于对 log 数据字段进行字母转换。 -- `regex`: 用于对 log 数据字段进行正则匹配。 -- `urlencoding`: 用于对 log 数据字段进行 URL 编解码。 -- `csv`: 用于对 log 数据字段进行 CSV 解析。 +- `date`: 解析格式化的时间字符串字段,例如 `2024-07-12T16:18:53.048`。 +- `epoch`: 解析数字时间戳字段,例如 `1720772378893`。 +- `dissect`: 对 log 数据字段进行拆分。 +- `gsub`: 对 log 数据字段进行替换。 +- `join`: 对 log 中的 array 类型字段进行合并。 +- `letter`: 对 log 数据字段进行字母转换。 +- `regex`: 对 log 数据字段进行正则匹配。 +- `urlencoding`: 对 log 数据字段进行 URL 编解码。 +- `csv`: 对 log 数据字段进行 CSV 解析。 ### `date` @@ -70,7 +70,7 @@ processors: 如上所示,`date` Processor 的配置包含以下字段: - `fields`: 需要解析的时间字段名列表。 -- `formats`: 时间格式化字符串,支持多个时间格式化字符串。按照提供的顺序尝试解析,直到解析成功。 +- `formats`: 时间格式化字符串,支持多个时间格式化字符串。按照提供的顺序尝试解析,直到解析成功。你可以在[这里](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)找到格式化的语法说明。 - `ignore_missing`: 忽略字段不存在的情况。默认为 `false`。如果字段不存在,并且此配置为 false,则会抛出异常。 - `timezone`: 时区。使用[tz_database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) 中的时区标识符来指定时区。默认为 `UTC`。 diff --git a/docs/v0.9/zh/user-guide/logs/quick-start.md b/docs/v0.9/zh/user-guide/logs/quick-start.md index 185bd4c86..1871e6e7c 100644 --- a/docs/v0.9/zh/user-guide/logs/quick-start.md +++ b/docs/v0.9/zh/user-guide/logs/quick-start.md @@ -99,7 +99,7 @@ curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "fi 成功执行此命令后,将创建一个名为 `nginx_pipeline` 的 pipeline,返回的结果如下: -```shell +```json {"name":"nginx_pipeline","version":"2024-06-27 12:02:34.257312110Z"}. ``` @@ -124,7 +124,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=pipeline_lo 如果命令执行成功,您将看到以下输出: -```shell +```json {"output":[{"affectedrows":4}],"execution_time_ms":79} ``` @@ -179,7 +179,7 @@ DESC pipeline_logs; ## 查询日志 -以 `pipeline_logs` 为例查询日志。 +以 `pipeline_logs` 表为例查询日志。 ### 按 Tag 查询日志 diff --git a/docs/v0.9/zh/user-guide/logs/write-logs.md b/docs/v0.9/zh/user-guide/logs/write-logs.md index add90491f..1da9a8851 100644 --- a/docs/v0.9/zh/user-guide/logs/write-logs.md +++ b/docs/v0.9/zh/user-guide/logs/write-logs.md @@ -15,7 +15,7 @@ curl -X "POST" "http://localhost:4000/v1/events/logs?db=&table=&table=