You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Log records collection in a Kubernetes cluster using FluentBit and YDB
1
+
# Log records collection using FluentBit
2
2
3
-
This section presents the implementation of integration between the Kubernetes cluster log shipping tool - FluentBit, with subsequent saving for viewing or analysis in {{ ydb-short-name }}.
3
+
This section describes the integration between {{ ydb-short-name }} and the log capture tool [FluentBit](https://fluentbit.io) to save and analyze the log records in {{ ydb-short-name }}.
4
4
5
-
## Introduction
6
5
7
-
FluentBit is a tool that can collect text data, manipulate it (change, transform, merge) and send it to various storage facilities for further processing.
6
+
## Overview
8
7
9
-
To deploy a scheme for delivering logs of running applications to Kubernetes using FluentBit and then saving them in YDB, you need to:
8
+
FluentBit is a tool that can collect text data, manipulate it (modify, transform, combine), and send it to various repositories for further processing. A custom plugin library for FluentBit has been developed to support saving the log records into {{ ydb-short-name }}. The library's source code is available in the [fluent-bit-ydb repository](https://github.com/ydb-platform/fluent-bit-ydb).
10
9
11
-
* Create table in YDB
10
+
Deploying a log delivery scheme using FluentBit and {{ ydb-short-name }} as the destination database includes the following steps:
12
11
13
-
* Configure [FluentBit](https://fluentbit.io)
12
+
1. Create {{ ydb-short-name }} tables for the log data storage
13
+
2. Deploy FluentBit and {{ ydb-short-name }} plugin for FluentBit
14
+
3. Configure FluentBit to collect and process the logs
15
+
4. Configure FluentBit to send the logs to {{ ydb-short-name }} tables
14
16
15
-
* Deploy [FluentBit](https://fluentbit.io) in Kubernetes cluster using [HELM](https://helm.sh)
16
17
17
-
The work diagram looks like this:
18
+
## Creating tables for log data
18
19
19
-

20
-
<small>Figure 1 — Interaction diagram between FluentBit and YDB in the Kubernetes cluster</small>
20
+
Tables for log data storage must be created in the chosen {{ ydb-short-name }} database. The structure of the tables is determined by a set of fields of a specific log supplied using FluentBit. Depending on the requirements, different log types may be saved to different tables. Normally, the table for log data contains the following fields:
21
21
22
-
In this diagram:
22
+
* timestamp
23
+
* log level
24
+
* hostname
25
+
* service name
26
+
* message text or its semi-structural representation as JSON document
23
27
24
-
* Application pods write logs to stdout/stderr
28
+
{{ ydb-short-name }} tables must have a primary key, uniquely identifying each table's row. A timestamp does not always uniquely identify messages coming from a particular source because messages might be generated simultaneously. To enforce the uniqueness of the primary key, a hash value can be added to the table. The hash value is computed using the [CityHash64](https://github.com/google/cityhash) algorithm over the log record data.
25
29
26
-
* Text from stdout/stderr is saved as files on Kubernetes worker nodes
30
+
Row-based and columnar tables can both be used for log data storage. Columnar tables are recommended, as they support more efficient data scans for log data retrieval.
27
31
28
-
* Pod with FluentBit
32
+
Example of the row-based table for log data storage:
33
+
34
+
```sql
35
+
CREATE TABLE `fluent-bit/log` (
36
+
`timestamp`TimestampNOT NULL,
37
+
`hostname`TextNOT NULL,
38
+
`input`TextNOT NULL,
39
+
`datahash` Uint64 NOT NULL,
40
+
`level`TextNULL,
41
+
`message`TextNULL,
42
+
`other` JsonDocument NULL,
43
+
PRIMARY KEY (
44
+
`datahash`, `timestamp`, `hostname`, `input`
45
+
)
46
+
);
47
+
```
48
+
49
+
Example of the columnar table for log data storage:
50
+
51
+
```sql
52
+
CREATE TABLE `fluent-bit/log` (
53
+
`timestamp`TimestampNOT NULL,
54
+
`hostname`TextNOT NULL,
55
+
`input`TextNOT NULL,
56
+
`datahash` Uint64 NOT NULL,
57
+
`level`TextNULL,
58
+
`message`TextNULL,
59
+
`other` JsonDocument NULL,
60
+
PRIMARY KEY (
61
+
`timestamp`, `hostname`, `input`, `datahash`
62
+
)
63
+
) PARTITION BY HASH(`timestamp`, `hostname`, `input`)
64
+
WITH (STORE = COLUMN);
65
+
```
66
+
67
+
The command that creates the columnar table differs in the following details:
68
+
69
+
* it specifies the columnar storage type and the table's partitioning key in the last two lines;
70
+
* the `timestamp` column is the first column of the primary key, which is optimal and recommended for columnar, but not for row-based tables. See the specific guidelines for choosing the primary key [for columnar tables](../dev/primary-key/column-oriented.md) and [for row-based tables](../dev/primary-key/row-oriented.md).
71
+
72
+
[TTL configuration](../concepts/ttl.md) can be optionally applied to the table, limiting the data storage period and enabling the automatic removal of obsolete data. Enabling TTL requires an extra setting in the `WITH` section of the table creation command. For example, `TTL = Interval("P14D") ON timestamp` sets the storage period to 14 days, based on the `timestamp` field's value.
29
73
30
-
* Mounts a folder with log files for itself
31
74
32
-
* Reads the contents from them
75
+
## FluentBit deployment and configuration
33
76
34
-
* Enriches posts with additional metadata
77
+
FluentBit deployment should be performed according to [its documentation](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit).
35
78
36
-
* Saves records to YDB cluster
79
+
{{ ydb-short-name }} plugin for FluentBit is available in the source code form in the [repository](https://github.com/ydb-platform/fluent-bit-ydb), along with the build instructions. A docker image is provided for container-based deployments: `ghcr.io/ydb-platform/fluent-bit-ydb`.
37
80
38
-
## Creating a table in YDB
81
+
General logic, configuration syntax and procedures to set up the receiving, processing, and delivering logs in the FluentBit environment are defined in the corresponding [FluentBit documentation](https://docs.fluentbit.io/manual/concepts/key-concepts).
39
82
40
-
On the selected YDB cluster, you need to run the following query:
83
+
84
+
## Writing logs to {{ ydb-short-name }} tables
85
+
86
+
Before using the {{ ydb-short-name }} output plugin, it needs to be enabled in the FluentBit settings. The list of the enabled FluentBit plugins is configured in a file (for example, `plugins.conf`), which is referenced through the `plugins_file` parameter in the `SERVICE` section of the main FluentBit configuration file. Below is the example of such a file with {{ ydb-short-name }} plugin enabled (plugin library path may be different depending on your setup):
87
+
88
+
```text
89
+
# plugins.conf
90
+
[PLUGINS]
91
+
Path /usr/lib/fluent-bit/out_ydb.so
92
+
```
93
+
94
+
The table below lists the configuration parameters supported by the {{ ydb-short-name }} output plugin for FluentBit.
95
+
96
+
|**Parameter**|**Description**|
97
+
|----------|--------------|
98
+
|`Name`| Plugin type, should be value `ydb`|
99
+
|`Match`| (optional) [Tag matching expression](https://docs.fluentbit.io/manual/concepts/data-pipeline/router) to select log records which should be routed to {{ ydb-short-name }} |
100
+
|`ConnectionURL`| YDB connection URL, including the protocol, endpoint, and database path (see the [documentation](../concepts/connect.md)) |
101
+
|`TablePath`| Table path starting from database root (example: `fluent-bit/log`) |
102
+
|`Columns`| JSON structure mapping the fields of FluentBit record to the columns of the target YDB table. May include the pseudo-columns listed below |
103
+
|`CredentialsAnonymous`| Configured as `1` for anonymous {{ ydb-short-name }} authentication |
104
+
|`CredentialsToken`| Token value, to use the token authentication {{ ydb-short-name }} mode |
105
+
|`CredentialsYcMetadata`| Configure as `1` for virtual machine metadata {{ ydb-short-name }} authentication |
106
+
|`CredentialsStatic`| Username and password for {{ ydb-short-name }} authentication, specified in the following format: `username:password@`|
107
+
|`CredentialsYcServiceAccountKeyFile`| Path of a file containing the service account (SA) key, to use the SA key {{ ydb-short-name }} authentication |
108
+
|`CredentialsYcServiceAccountKeyJson`| JSON data of the service account key to be used instead of the filename (useful in K8s environment) |
109
+
|`Certificates`| Path to the certificate authority (CA) trusted certificates file, or the literal trusted CA certificate value |
110
+
|`LogLevel`| Plugin-specific logging level should be one of `disabled` (default), `trace`, `debug`, `info`, `warn`, `error`, `fatal` or `panic`|
111
+
112
+
The following pseudo-columns are available, in addition to the actual FluentBit log record fields, to be used as source values in the column map (`Columns` parameter):
113
+
114
+
*`.timestamp` - log record timestamp (mandatory)
115
+
*`.input` - log input stream name (mandatory)
116
+
*`.hash` - uint64 hash code, computed over the log record fields (optional)
117
+
*`.other` - JSON document containing all log record fields that were not explicitly mapped to any table column (optional)
FluentBit is often used to collect logs in the Kubernetes environment. Below is the schema of the log delivery process, implemented using FluentBit and {{ ydb-short-name }}, for applications running in the Kubernetes cluster:
128
+
129
+

130
+
<small>Figure 1 — Interaction diagram between FluentBit and {{ ydb-short-name }} in the Kubernetes cluster</small>
131
+
132
+
In this diagram:
133
+
134
+
* Application pods write logs to stdout/stderr
135
+
* Text from stdout/stderr is saved as files on Kubernetes worker nodes
136
+
* Pod with FluentBit:
137
+
* Mounts a folder with log files for itself
138
+
* Reads the contents from the log files
139
+
* Enriches log records with additional metadata
140
+
* Saves records to {{ ydb-short-name }} database
141
+
142
+
### Table to store Kubernetes logs
143
+
144
+
Below is the {{ ydb-short-name }} table structure to store the Kubernetes logs:
WITH (STORE = COLUMN, TTL = Interval("P14D") ON`timestamp`);
56
161
```
57
162
58
-
Column purpose:
59
-
60
-
* timestamp – the log timestamp
61
-
62
-
* file – name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written
163
+
Columns purpose:
63
164
64
-
* pipe – stdout or stderr stream where application-level writing was done
165
+
*`timestamp` – the log record timestamp;
166
+
*`file` – the name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written;
167
+
*`pipe` – stdout or stderr stream where application-level writing was done;
168
+
*`datahash` – hash code computed over the log record;
169
+
*`message` – the textual part of the log record;
170
+
*`message_parsed` – log record fields in the structured form, if it could be parsed using the configured FluentBit parsers from the textual part;
171
+
*`kubernetes` – information about the pod, including name, namespace, and annotations.
65
172
66
-
* message – the log message
173
+
Optionally, TTL can be configured for the table, as shown in the example.
67
174
68
-
* datahash – the CityHash64 hash code calculated over the log message (required to avoid overwriting messages from the same source and with the same timestamp)
175
+
### FluentBit configuration
69
176
70
-
* message_parsed – a structured log message, if it could be parsed using the fluent-bit parsers
177
+
In order to deploy FluentBit in the Kubernetes environment, a configuration file with the log collection and processing parameters must be prepared (typical file name: `values.yaml`). This section provides the necessary comments on this file's content with the examples.
71
178
72
-
* kubernetes – information about the pod, for example: name, namespace, logs and annotations
73
-
74
-
Optionally, you can set TTL for table rows
75
-
76
-
## FluentBit configuration
77
-
78
-
It is necessary to replace the repository and image version:
179
+
It is necessary to replace the repository and image version of the FluentBit container:
79
180
80
181
```yaml
81
182
image:
82
183
repository: ghcr.io/ydb-platform/fluent-bit-ydb
83
184
tag: latest
84
185
```
85
186
86
-
In this image, a plugin library has been added that implements YDB support. Source code is available [here](https://github.com/ydb-platform/fluent-bit-ydb)
187
+
In this image, a plugin library has been added that implements {{ ydb-short-name }} support.
87
188
88
189
The following lines define the rules for mounting log folders in FluentBit pods:
89
190
@@ -115,7 +216,7 @@ daemonSetVolumeMounts:
115
216
readOnly: true
116
217
```
117
218
118
-
Also, you need to redefine the command and launch arguments:
219
+
FluentBit startup parameters should be configured as shown below:
119
220
120
221
```yaml
121
222
command:
@@ -127,7 +228,7 @@ args:
127
228
- --config=/fluent-bit/etc/conf/fluent-bit.conf
128
229
```
129
230
130
-
And the pipeline itself for collecting, converting and delivering logs:
231
+
The FluentBit pipeline for collecting, converting, and delivering logs should be defined according to the example:
131
232
132
233
```yaml
133
234
config:
@@ -166,15 +267,13 @@ config:
166
267
CredentialsToken ${OUTPUT_YDB_CREDENTIALS_TOKEN}
167
268
```
168
269
169
-
Blocks description:
270
+
Configuration blocks description:
170
271
171
-
* Inputs. This block specifies where to read and how to parse logs. In this case, *.log files will be read from the /var/log/containers/ folder, which was mounted from the host
272
+
* `inputs` - this block specifies where to read and how to parse logs. In this case, `*.log` files will be read from the `/var/log/containers/` folder, which is mounted from the host
273
+
* `filters` - this block specifies how the logs will be processed. In this case, for each log record, the corresponding metadata is added (using the Kubernetes filter), and unused fields (`_p`, `time`) are cut out
274
+
* `outputs` - this block specifies where the logs will be sent. In this case, logs are saved into the `fluent-bit/log` table in the {{ ydb-short-name }} database. Database connection parameters (in the shown example, `ConnectionURL` and `CredentialsToken`) are defined using the environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`. Authentication parameters and the set of corresponding environment variables are updated depending on the configuration of the {{ ydb-short-name }} cluster being used.
172
275
173
-
* Filters. This block specifies how logs will be processed. In this case: for each log the corresponding metadata will be found (using the kubernetes filter), and unused fields (_p, time) will be cut out
174
-
175
-
* Outputs. This block specifies where the logs will be sent. In this case, to the `fluent-bit/log` table in the {{ ydb-short-name }} cluster. Cluster connection parameters (ConnectionURL, CredentialsToken) are set using the corresponding environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`
176
-
177
-
Environment variables are defined as follows:
276
+
Environment variables are defined as shown below:
178
277
179
278
```yaml
180
279
env:
@@ -187,21 +286,21 @@ env:
187
286
name: fluent-bit-ydb-plugin-token
188
287
```
189
288
190
-
The secret authorization token must be created in advance in the cluster. For example, using the command:
289
+
Authentication data should be stored as the secret object in the Kubernetes cluster configuration. Example command to create a Kubernetes secret:
HELM is a way to package and install applications in a Kubernetes cluster. To deploy FluentBit, you need to add a chart repository using the command:
297
+
[HELM](https://helm.sh) is a tool to package and install applications in a Kubernetes cluster. To deploy FluentBit, the corresponding chart repository (containing the installation scenario) should be added using the following command:
0 commit comments