Skip to content

Commit aa2f2ae

Browse files
paomiannicecuishuiyisongfengjiachun
authored
chore: add pipeline and log doc (#1026)
Co-authored-by: Yiran <cuiyiran3@gmail.com> Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com> Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
1 parent 238d114 commit aa2f2ae

File tree

10 files changed

+1511
-0
lines changed

10 files changed

+1511
-0
lines changed

docs/nightly/en/user-guide/log/log-pipeline.md

Lines changed: 440 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Managing Pipelines
2+
3+
In GreptimeDB, each `pipeline` is a collection of data processing units used for parsing and transforming the ingested log content. This document provides guidance on creating and deleting pipelines to efficiently manage the processing flow of log data.
4+
5+
6+
For specific pipeline configurations, please refer to the [Pipeline Configuration](log-pipeline.md) documentation.
7+
8+
## Create a Pipeline
9+
10+
GreptimeDB provides a dedicated HTTP interface for creating pipelines.
11+
Assuming you have prepared a pipeline configuration file `pipeline.yaml`, use the following command to upload the configuration file, where `test` is the name you specify for the pipeline:
12+
13+
```shell
14+
## Upload the pipeline file. 'test' is the name of the pipeline
15+
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml"
16+
```
17+
18+
## Delete a Pipeline
19+
20+
You can use the following HTTP interface to delete a pipeline:
21+
22+
```shell
23+
## 'test' is the name of the pipeline
24+
curl -X "DELETE" "http://localhost:4000/v1/events/pipelines/test?version=2024-06-27%2012%3A02%3A34.257312110Z"
25+
```
26+
27+
In the above example, we deleted a pipeline named `test`. The `version` parameter is required to specify the version of the pipeline to be deleted.
28+
29+
## Query Pipelines
30+
31+
Currently, you can use SQL to query pipeline information.
32+
33+
```sql
34+
SELECT * FROM greptime_private.pipelines;
35+
```
36+
37+
Please note that if you are using the MySQL or PostgreSQL protocol to connect to GreptimeDB, the precision of the pipeline time information may vary, and nanosecond-level precision may be lost.
38+
39+
To address this issue, you can cast the `created_at` field to a timestamp to view the pipeline's creation time. For example, the following query displays `created_at` in `bigint` format:
40+
41+
```sql
42+
SELECT name, pipeline, created_at::bigint FROM greptime_private.pipelines;
43+
```
44+
45+
The query result is as follows:
46+
47+
```
48+
name | pipeline | greptime_private.pipelines.created_at
49+
------+-----------------------------------+---------------------------------------
50+
test | processors: +| 1719489754257312110
51+
| - date: +|
52+
| field: time +|
53+
| formats: +|
54+
| - "%Y-%m-%d %H:%M:%S%.3f"+|
55+
| ignore_missing: true +|
56+
| +|
57+
| transform: +|
58+
| - fields: +|
59+
| - id1 +|
60+
| - id2 +|
61+
| type: int32 +|
62+
| - fields: +|
63+
| - type +|
64+
| - logger +|
65+
| type: string +|
66+
| index: tag +|
67+
| - fields: +|
68+
| - log +|
69+
| type: string +|
70+
| index: fulltext +|
71+
| - field: time +|
72+
| type: time +|
73+
| index: timestamp +|
74+
| |
75+
(1 row)
76+
```
77+
78+
Then, you can use a program to convert the bigint type timestamp from the SQL result into a time string.
79+
80+
```shell
81+
timestamp_ns="1719489754257312110"; readable_timestamp=$(TZ=UTC date -d @$((${timestamp_ns:0:10}+0)) +"%Y-%m-%d %H:%M:%S").${timestamp_ns:10}Z; echo "Readable timestamp (UTC): $readable_timestamp"
82+
```
83+
84+
Output:
85+
86+
```shell
87+
Readable timestamp (UTC): 2024-06-27 12:02:34.257312110Z
88+
```
89+
90+
The output `Readable timestamp (UTC)` represents the creation time of the pipeline and also serves as the version number.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Overview
2+
3+
- [Quick Start](./quick-start.md): Provides an introduction on how to quickly get started with GreptimeDB log service.
4+
- [Pipeline Configuration](./log-pipeline.md): Provides in-depth information on each specific configuration of pipelines in GreptimeDB.
5+
- [Managing Pipelines](./manage-pipeline.md): Explains how to create and delete pipelines.
6+
- [Writing Logs with Pipelines](./write-log.md): Provides detailed instructions on efficiently writing log data by leveraging the pipeline mechanism.
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# Quick Start
2+
3+
4+
## Download and install & start GreptimeDB
5+
6+
Follow the [Installation Guide](/getting-started/overview.md) to install and start GreptimeDB.
7+
8+
## Create a Pipeline
9+
10+
GreptimeDB provides a dedicated HTTP interface for creating Pipelines. Here's how to do it:
11+
12+
First, create a Pipeline file, for example, `pipeline.yaml`.
13+
14+
```yaml
15+
# pipeline.yaml
16+
processors:
17+
- date:
18+
field: time
19+
formats:
20+
- "%Y-%m-%d %H:%M:%S%.3f"
21+
ignore_missing: true
22+
23+
transform:
24+
- fields:
25+
- id1
26+
- id2
27+
type: int32
28+
- fields:
29+
- type
30+
- logger
31+
type: string
32+
index: tag
33+
- fields:
34+
- log
35+
type: string
36+
index: fulltext
37+
- field: time
38+
type: time
39+
index: timestamp
40+
```
41+
42+
Then, execute the following command to upload the configuration file:
43+
44+
```shell
45+
## Upload the pipeline file. "test" is the name of the Pipeline
46+
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "file=@pipeline.yaml"
47+
```
48+
49+
After the successful execution of this command, a Pipeline named `test` will be created, and the result will be returned as: `{"name":"test","version":"2024-06-27 12:02:34.257312110Z"}`.
50+
Here, `name` is the name of the Pipeline, and `version` is the Pipeline version.
51+
52+
This Pipeline includes one Processor and three Transforms. The Processor uses the Rust time format string `%Y-%m-%d %H:%M:%S%.3f` to parse the timestamp field in the logs, and then the Transforms convert the `id1` and `id2` fields to `int32` type, the `type` and `logger` fields to `string` type with an index of "tag", the `log` field to `string` type with an index of "fulltext", and the `time` field to a time type with an index of "timestamp".
53+
54+
Refer to the [Pipeline Introduction](log-pipeline.md) for specific syntax details.
55+
56+
## Query Pipelines
57+
58+
You can use SQL to query the pipeline content stored in the database. The example query is as follows:
59+
60+
```sql
61+
SELECT * FROM greptime_private.pipelines;
62+
```
63+
64+
The query result is as follows:
65+
66+
```sql
67+
name | schema | content_type | pipeline | created_at
68+
------+--------+--------------+-----------------------------------+----------------------------
69+
test | public | yaml | processors: +| 2024-06-27 12:02:34.257312
70+
| | | - date: +|
71+
| | | field: time +|
72+
| | | formats: +|
73+
| | | - "%Y-%m-%d %H:%M:%S%.3f"+|
74+
| | | ignore_missing: true +|
75+
| | | +|
76+
| | | transform: +|
77+
| | | - fields: +|
78+
| | | - id1 +|
79+
| | | - id2 +|
80+
| | | type: int32 +|
81+
| | | - fields: +|
82+
| | | - type +|
83+
| | | - logger +|
84+
| | | type: string +|
85+
| | | index: tag +|
86+
| | | - fields: +|
87+
| | | - log +|
88+
| | | type: string +|
89+
| | | index: fulltext +|
90+
| | | - field: time +|
91+
| | | type: time +|
92+
| | | index: timestamp +|
93+
| | | |
94+
(1 row)
95+
```
96+
97+
## Write logs
98+
99+
The HTTP interface for writing logs is as follows:
100+
101+
```shell
102+
curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \
103+
-H 'Content-Type: application/json' \
104+
-d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
105+
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
106+
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}
107+
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"ClusterAdapter:enter sendTextDataToCluster\\n"}'
108+
```
109+
110+
The above command returns the following result:
111+
112+
```json
113+
{"output":[{"affectedrows":4}],"execution_time_ms":22}
114+
```
115+
116+
In the above example, we successfully wrote 4 log entries to the `public.logs` table.
117+
118+
Please refer to [Writing Logs with Pipeline](write-log.md) for specific syntax for writing logs.
119+
120+
## `logs` table structure
121+
122+
We can use SQL to query the structure of the `public.logs` table.
123+
124+
```sql
125+
DESC TABLE logs;
126+
```
127+
128+
The query result is as follows:
129+
130+
```sql
131+
Column | Type | Key | Null | Default | Semantic Type
132+
--------+---------------------+-----+------+---------+---------------
133+
id1 | Int32 | | YES | | FIELD
134+
id2 | Int32 | | YES | | FIELD
135+
type | String | PRI | YES | | TAG
136+
logger | String | PRI | YES | | TAG
137+
log | String | | YES | | FIELD
138+
time | TimestampNanosecond | PRI | NO | | TIMESTAMP
139+
(6 rows)
140+
```
141+
142+
From the above result, we can see that based on the processed result of the pipeline, the `public.logs` table contains 6 fields: `id1` and `id2` are converted to the `Int32` type, `type`, `log`, and `logger` are converted to the `String` type, and time is converted to a `TimestampNanosecond` type and indexed as Timestamp.
143+
144+
## Query logs
145+
146+
We can use standard SQL to query log data.
147+
148+
```shell
149+
# Connect to GreptimeDB using MySQL or PostgreSQL protocol
150+
151+
# MySQL
152+
mysql --host=127.0.0.1 --port=4002 public
153+
154+
# PostgreSQL
155+
psql -h 127.0.0.1 -p 4003 -d public
156+
```
157+
158+
You can query the log table using SQL:
159+
160+
```sql
161+
SELECT * FROM public.logs;
162+
```
163+
164+
The query result is as follows:
165+
166+
```sql
167+
id1 | id2 | type | logger | log | time
168+
------+------+------+------------------+--------------------------------------------+----------------------------
169+
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
170+
| | | | |
171+
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
172+
| | | | |
173+
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
174+
| | | | |
175+
2436 | 2528 | I | INTERACT.MANAGER | ClusterAdapter:enter sendTextDataToCluster+| 2024-05-25 20:16:37.217000
176+
| | | | |
177+
(4 rows)
178+
```
179+
180+
As you can see, the logs have been stored as structured logs after applying type conversions using the pipeline. This provides convenience for further querying and analysis of the logs.
181+
182+
## Conclusion
183+
184+
By following the above steps, you have successfully created a pipeline, written logs, and performed queries. This is just the tip of the iceberg in terms of the capabilities offered by GreptimeDB.
185+
Next, please continue reading [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) to learn more about advanced features and best practices.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Writing Logs Using a Pipeline
2+
3+
This document describes how to write logs to GreptimeDB by processing them through a specified pipeline using the HTTP interface.
4+
5+
Before writing logs, please read the [Pipeline Configuration](log-pipeline.md) and [Managing Pipelines](manage-pipeline.md) documents to complete the configuration setup and upload.
6+
7+
## HTTP API
8+
9+
You can use the following command to write logs via the HTTP interface:
10+
11+
```shell
12+
curl -X "POST" "http://localhost:4000/v1/events/logs?db=<db-name>&table=<table-name>&pipeline_name=<pipeline-name>" \
13+
-H 'Content-Type: application/json' \
14+
-d "$<log-items>"
15+
```
16+
17+
## Query parameters
18+
19+
This interface accepts the following parameters:
20+
21+
- `db`: The name of the database.
22+
- `table`: The name of the table.
23+
- `pipeline_name`: The name of the [pipeline](./log-pipeline.md).
24+
25+
## Body data format
26+
27+
The request body supports NDJSON and JSON Array formats, where each JSON object represents a log entry.
28+
29+
## Example
30+
31+
Please refer to the "Writing Logs" section in the [Quick Start](quick-start.md#write-logs) guide for an example.

0 commit comments

Comments
 (0)