Skip to content

Commit a89fd80

Browse files
authored
YDBDOCS-188 FluentBit - not just K8s (#4635)
1 parent 2ed4009 commit a89fd80

File tree

4 files changed

+330
-129
lines changed

4 files changed

+330
-129
lines changed

ydb/docs/en/core/integrations/fluent-bit.md

Lines changed: 159 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,147 @@
1-
# Log records collection in a Kubernetes cluster using FluentBit and YDB
1+
# Log records collection using FluentBit
22

3-
This section presents the implementation of integration between the Kubernetes cluster log shipping tool - FluentBit, with subsequent saving for viewing or analysis in {{ ydb-short-name }}.
3+
This section describes the integration between {{ ydb-short-name }} and the log capture tool [FluentBit](https://fluentbit.io) to save and analyze the log records in {{ ydb-short-name }}.
44

5-
## Introduction
65

7-
FluentBit is a tool that can collect text data, manipulate it (change, transform, merge) and send it to various storage facilities for further processing.
6+
## Overview
87

9-
To deploy a scheme for delivering logs of running applications to Kubernetes using FluentBit and then saving them in YDB, you need to:
8+
FluentBit is a tool that can collect text data, manipulate it (modify, transform, combine), and send it to various repositories for further processing. A custom plugin library for FluentBit has been developed to support saving the log records into {{ ydb-short-name }}. The library's source code is available in the [fluent-bit-ydb repository](https://github.com/ydb-platform/fluent-bit-ydb).
109

11-
* Create table in YDB
10+
Deploying a log delivery scheme using FluentBit and {{ ydb-short-name }} as the destination database includes the following steps:
1211

13-
* Configure [FluentBit](https://fluentbit.io)
12+
1. Create {{ ydb-short-name }} tables for the log data storage
13+
2. Deploy FluentBit and {{ ydb-short-name }} plugin for FluentBit
14+
3. Configure FluentBit to collect and process the logs
15+
4. Configure FluentBit to send the logs to {{ ydb-short-name }} tables
1416

15-
* Deploy [FluentBit](https://fluentbit.io) in Kubernetes cluster using [HELM](https://helm.sh)
1617

17-
The work diagram looks like this:
18+
## Creating tables for log data
1819

19-
![FluentBit in Kubernetes cluster](../_assets/fluent-bit-ydb-scheme.png)
20-
<small>Figure 1 — Interaction diagram between FluentBit and YDB in the Kubernetes cluster</small>
20+
Tables for log data storage must be created in the chosen {{ ydb-short-name }} database. The structure of the tables is determined by a set of fields of a specific log supplied using FluentBit. Depending on the requirements, different log types may be saved to different tables. Normally, the table for log data contains the following fields:
2121

22-
In this diagram:
22+
* timestamp
23+
* log level
24+
* hostname
25+
* service name
26+
* message text or its semi-structural representation as JSON document
2327

24-
* Application pods write logs to stdout/stderr
28+
{{ ydb-short-name }} tables must have a primary key, uniquely identifying each table's row. A timestamp does not always uniquely identify messages coming from a particular source because messages might be generated simultaneously. To enforce the uniqueness of the primary key, a hash value can be added to the table. The hash value is computed using the [CityHash64](https://github.com/google/cityhash) algorithm over the log record data.
2529

26-
* Text from stdout/stderr is saved as files on Kubernetes worker nodes
30+
Row-based and columnar tables can both be used for log data storage. Columnar tables are recommended, as they support more efficient data scans for log data retrieval.
2731

28-
* Pod with FluentBit
32+
Example of the row-based table for log data storage:
33+
34+
```sql
35+
CREATE TABLE `fluent-bit/log` (
36+
`timestamp` Timestamp NOT NULL,
37+
`hostname` Text NOT NULL,
38+
`input` Text NOT NULL,
39+
`datahash` Uint64 NOT NULL,
40+
`level` Text NULL,
41+
`message` Text NULL,
42+
`other` JsonDocument NULL,
43+
PRIMARY KEY (
44+
`datahash`, `timestamp`, `hostname`, `input`
45+
)
46+
);
47+
```
48+
49+
Example of the columnar table for log data storage:
50+
51+
```sql
52+
CREATE TABLE `fluent-bit/log` (
53+
`timestamp` Timestamp NOT NULL,
54+
`hostname` Text NOT NULL,
55+
`input` Text NOT NULL,
56+
`datahash` Uint64 NOT NULL,
57+
`level` Text NULL,
58+
`message` Text NULL,
59+
`other` JsonDocument NULL,
60+
PRIMARY KEY (
61+
`timestamp`, `hostname`, `input`, `datahash`
62+
)
63+
) PARTITION BY HASH(`timestamp`, `hostname`, `input`)
64+
WITH (STORE = COLUMN);
65+
```
66+
67+
The command that creates the columnar table differs in the following details:
68+
69+
* it specifies the columnar storage type and the table's partitioning key in the last two lines;
70+
* the `timestamp` column is the first column of the primary key, which is optimal and recommended for columnar, but not for row-based tables. See the specific guidelines for choosing the primary key [for columnar tables](../dev/primary-key/column-oriented.md) and [for row-based tables](../dev/primary-key/row-oriented.md).
71+
72+
[TTL configuration](../concepts/ttl.md) can be optionally applied to the table, limiting the data storage period and enabling the automatic removal of obsolete data. Enabling TTL requires an extra setting in the `WITH` section of the table creation command. For example, `TTL = Interval("P14D") ON timestamp` sets the storage period to 14 days, based on the `timestamp` field's value.
2973

30-
* Mounts a folder with log files for itself
3174

32-
* Reads the contents from them
75+
## FluentBit deployment and configuration
3376

34-
* Enriches posts with additional metadata
77+
FluentBit deployment should be performed according to [its documentation](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit).
3578

36-
* Saves records to YDB cluster
79+
{{ ydb-short-name }} plugin for FluentBit is available in the source code form in the [repository](https://github.com/ydb-platform/fluent-bit-ydb), along with the build instructions. A docker image is provided for container-based deployments: `ghcr.io/ydb-platform/fluent-bit-ydb`.
3780

38-
## Creating a table in YDB
81+
General logic, configuration syntax and procedures to set up the receiving, processing, and delivering logs in the FluentBit environment are defined in the corresponding [FluentBit documentation](https://docs.fluentbit.io/manual/concepts/key-concepts).
3982

40-
On the selected YDB cluster, you need to run the following query:
83+
84+
## Writing logs to {{ ydb-short-name }} tables
85+
86+
Before using the {{ ydb-short-name }} output plugin, it needs to be enabled in the FluentBit settings. The list of the enabled FluentBit plugins is configured in a file (for example, `plugins.conf`), which is referenced through the `plugins_file` parameter in the `SERVICE` section of the main FluentBit configuration file. Below is the example of such a file with {{ ydb-short-name }} plugin enabled (plugin library path may be different depending on your setup):
87+
88+
```text
89+
# plugins.conf
90+
[PLUGINS]
91+
Path /usr/lib/fluent-bit/out_ydb.so
92+
```
93+
94+
The table below lists the configuration parameters supported by the {{ ydb-short-name }} output plugin for FluentBit.
95+
96+
| **Parameter** | **Description** |
97+
|----------|--------------|
98+
| `Name` | Plugin type, should be value `ydb` |
99+
| `Match` | (optional) [Tag matching expression](https://docs.fluentbit.io/manual/concepts/data-pipeline/router) to select log records which should be routed to {{ ydb-short-name }} |
100+
| `ConnectionURL` | YDB connection URL, including the protocol, endpoint, and database path (see the [documentation](../concepts/connect.md)) |
101+
| `TablePath` | Table path starting from database root (example: `fluent-bit/log`) |
102+
| `Columns` | JSON structure mapping the fields of FluentBit record to the columns of the target YDB table. May include the pseudo-columns listed below |
103+
| `CredentialsAnonymous` | Configured as `1` for anonymous {{ ydb-short-name }} authentication |
104+
| `CredentialsToken` | Token value, to use the token authentication {{ ydb-short-name }} mode |
105+
| `CredentialsYcMetadata` | Configure as `1` for virtual machine metadata {{ ydb-short-name }} authentication |
106+
| `CredentialsStatic` | Username and password for {{ ydb-short-name }} authentication, specified in the following format: `username:password@` |
107+
| `CredentialsYcServiceAccountKeyFile` | Path of a file containing the service account (SA) key, to use the SA key {{ ydb-short-name }} authentication |
108+
| `CredentialsYcServiceAccountKeyJson` | JSON data of the service account key to be used instead of the filename (useful in K8s environment) |
109+
| `Certificates` | Path to the certificate authority (CA) trusted certificates file, or the literal trusted CA certificate value |
110+
| `LogLevel` | Plugin-specific logging level should be one of `disabled` (default), `trace`, `debug`, `info`, `warn`, `error`, `fatal` or `panic` |
111+
112+
The following pseudo-columns are available, in addition to the actual FluentBit log record fields, to be used as source values in the column map (`Columns` parameter):
113+
114+
* `.timestamp` - log record timestamp (mandatory)
115+
* `.input` - log input stream name (mandatory)
116+
* `.hash` - uint64 hash code, computed over the log record fields (optional)
117+
* `.other` - JSON document containing all log record fields that were not explicitly mapped to any table column (optional)
118+
119+
Example of `Columns` parameter value:
120+
121+
```json
122+
{".timestamp": "timestamp", ".input": "input", ".hash": "datahash", "log": "message", "level": "level", "host": "hostname", ".other": "other"}
123+
```
124+
125+
## Collecting logs in a Kubernetes cluster
126+
127+
FluentBit is often used to collect logs in the Kubernetes environment. Below is the schema of the log delivery process, implemented using FluentBit and {{ ydb-short-name }}, for applications running in the Kubernetes cluster:
128+
129+
![FluentBit in Kubernetes cluster](../_assets/fluent-bit-ydb-scheme.png)
130+
<small>Figure 1 — Interaction diagram between FluentBit and {{ ydb-short-name }} in the Kubernetes cluster</small>
131+
132+
In this diagram:
133+
134+
* Application pods write logs to stdout/stderr
135+
* Text from stdout/stderr is saved as files on Kubernetes worker nodes
136+
* Pod with FluentBit:
137+
* Mounts a folder with log files for itself
138+
* Reads the contents from the log files
139+
* Enriches log records with additional metadata
140+
* Saves records to {{ ydb-short-name }} database
141+
142+
### Table to store Kubernetes logs
143+
144+
Below is the {{ ydb-short-name }} table structure to store the Kubernetes logs:
41145

42146
```sql
43147
CREATE TABLE `fluent-bit/log` (
@@ -52,38 +156,35 @@ CREATE TABLE `fluent-bit/log` (
52156
PRIMARY KEY (
53157
`timestamp`, `file`, `datahash`
54158
)
55-
)
159+
) PARTITION BY HASH(`timestamp`, `file`)
160+
WITH (STORE = COLUMN, TTL = Interval("P14D") ON `timestamp`);
56161
```
57162

58-
Column purpose:
59-
60-
* timestamp – the log timestamp
61-
62-
* file – name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written
163+
Columns purpose:
63164

64-
* pipe – stdout or stderr stream where application-level writing was done
165+
* `timestamp` – the log record timestamp;
166+
* `file` – the name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written;
167+
* `pipe` – stdout or stderr stream where application-level writing was done;
168+
* `datahash` – hash code computed over the log record;
169+
* `message` – the textual part of the log record;
170+
* `message_parsed` – log record fields in the structured form, if it could be parsed using the configured FluentBit parsers from the textual part;
171+
* `kubernetes` – information about the pod, including name, namespace, and annotations.
65172

66-
* message – the log message
173+
Optionally, TTL can be configured for the table, as shown in the example.
67174

68-
* datahash – the CityHash64 hash code calculated over the log message (required to avoid overwriting messages from the same source and with the same timestamp)
175+
### FluentBit configuration
69176

70-
* message_parsed – a structured log message, if it could be parsed using the fluent-bit parsers
177+
In order to deploy FluentBit in the Kubernetes environment, a configuration file with the log collection and processing parameters must be prepared (typical file name: `values.yaml`). This section provides the necessary comments on this file's content with the examples.
71178

72-
* kubernetes – information about the pod, for example: name, namespace, logs and annotations
73-
74-
Optionally, you can set TTL for table rows
75-
76-
## FluentBit configuration
77-
78-
It is necessary to replace the repository and image version:
179+
It is necessary to replace the repository and image version of the FluentBit container:
79180

80181
```yaml
81182
image:
82183
repository: ghcr.io/ydb-platform/fluent-bit-ydb
83184
tag: latest
84185
```
85186
86-
In this image, a plugin library has been added that implements YDB support. Source code is available [here](https://github.com/ydb-platform/fluent-bit-ydb)
187+
In this image, a plugin library has been added that implements {{ ydb-short-name }} support.
87188
88189
The following lines define the rules for mounting log folders in FluentBit pods:
89190
@@ -115,7 +216,7 @@ daemonSetVolumeMounts:
115216
readOnly: true
116217
```
117218
118-
Also, you need to redefine the command and launch arguments:
219+
FluentBit startup parameters should be configured as shown below:
119220
120221
```yaml
121222
command:
@@ -127,7 +228,7 @@ args:
127228
- --config=/fluent-bit/etc/conf/fluent-bit.conf
128229
```
129230
130-
And the pipeline itself for collecting, converting and delivering logs:
231+
The FluentBit pipeline for collecting, converting, and delivering logs should be defined according to the example:
131232
132233
```yaml
133234
config:
@@ -166,15 +267,13 @@ config:
166267
CredentialsToken ${OUTPUT_YDB_CREDENTIALS_TOKEN}
167268
```
168269
169-
Blocks description:
270+
Configuration blocks description:
170271
171-
* Inputs. This block specifies where to read and how to parse logs. In this case, *.log files will be read from the /var/log/containers/ folder, which was mounted from the host
272+
* `inputs` - this block specifies where to read and how to parse logs. In this case, `*.log` files will be read from the `/var/log/containers/` folder, which is mounted from the host
273+
* `filters` - this block specifies how the logs will be processed. In this case, for each log record, the corresponding metadata is added (using the Kubernetes filter), and unused fields (`_p`, `time`) are cut out
274+
* `outputs` - this block specifies where the logs will be sent. In this case, logs are saved into the `fluent-bit/log` table in the {{ ydb-short-name }} database. Database connection parameters (in the shown example, `ConnectionURL` and `CredentialsToken`) are defined using the environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`. Authentication parameters and the set of corresponding environment variables are updated depending on the configuration of the {{ ydb-short-name }} cluster being used.
172275

173-
* Filters. This block specifies how logs will be processed. In this case: for each log the corresponding metadata will be found (using the kubernetes filter), and unused fields (_p, time) will be cut out
174-
175-
* Outputs. This block specifies where the logs will be sent. In this case, to the `fluent-bit/log` table in the {{ ydb-short-name }} cluster. Cluster connection parameters (ConnectionURL, CredentialsToken) are set using the corresponding environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`
176-
177-
Environment variables are defined as follows:
276+
Environment variables are defined as shown below:
178277

179278
```yaml
180279
env:
@@ -187,21 +286,21 @@ env:
187286
name: fluent-bit-ydb-plugin-token
188287
```
189288

190-
The secret authorization token must be created in advance in the cluster. For example, using the command:
289+
Authentication data should be stored as the secret object in the Kubernetes cluster configuration. Example command to create a Kubernetes secret:
191290

192291
```sh
193292
kubectl create secret -n ydb-fluent-bit-integration generic fluent-bit-ydb-plugin-token --from-literal=token=<YDB TOKEN>
194293
```
195294

196-
## FluentBit deployment
295+
### Deploying FluentBit in a Kubernetes cluster
197296

198-
HELM is a way to package and install applications in a Kubernetes cluster. To deploy FluentBit, you need to add a chart repository using the command:
297+
[HELM](https://helm.sh) is a tool to package and install applications in a Kubernetes cluster. To deploy FluentBit, the corresponding chart repository (containing the installation scenario) should be added using the following command:
199298

200299
```sh
201300
helm repo add fluent https://fluent.github.io/helm-charts
202301
```
203302

204-
Installing FluentBit on a Kubernetes cluster is done using the following command:
303+
After that, FluentBit can be deployed to a Kubernetes cluster with the following command:
205304

206305
```sh
207306
helm upgrade --install fluent-bit fluent/fluent-bit \
@@ -211,29 +310,31 @@ helm upgrade --install fluent-bit fluent/fluent-bit \
211310
--values values.yaml
212311
```
213312

214-
## Verify the installation
313+
The argument `--values` in the example command shown above references the file containing the FluentBit settings.
314+
315+
### Installation verification
215316

216-
Check that fluent-bit has started by reading its logs (there should be no [error] level entries):
317+
Check that FluentBit has started by reading its logs (there should be no `[error]` level entries):
217318

218319
```sh
219320
kubectl logs -n ydb-fluent-bit-integration -l app.kubernetes.io/instance=fluent-bit
220321
```
221322

222-
Check that there are records in the YDB table (they will appear approximately a few minutes after launching FluentBit):
323+
Check that there are records in the {{ ydb-short-name }} table (they will appear approximately a few minutes after launching FluentBit):
223324

224325
```sql
225326
SELECT * FROM `fluent-bit/log` LIMIT 10 ORDER BY `timestamp` DESC
226327
```
227328

228-
## Resource cleanup
329+
### Resources cleanup
229330

230-
It is enough to remove the namespace with fluent-bit:
331+
To remove FluentBit, it is sufficient to delete the Kubernetes namespace which was used for the installation:
231332

232333
```sh
233334
kubectl delete namespace ydb-fluent-bit-integration
234335
```
235336

236-
And a table with logs:
337+
After uninstalling FluentBit, the log storage table can be dropped from the {{ ydb-short-name }} database:
237338

238339
```sql
239340
DROP TABLE `fluent-bit/log`
Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
items:
2-
- name: Log records collection in a Kubernetes cluster using FluentBit and YDB
2+
- name: Log records collection using FluentBit
33
href: fluent-bit.md
44
- name: Grafana data source
55
href: grafana.md
66
- name: Schema migrations with goose
77
href: goose.md
8-
- name: Importing from JDBC data sources
9-
href: import-jdbc.md
10-
- name: Liquibase
8+
- name: Schema migrations with Liquibase
119
href: liquibase.md
12-
- name: Flyway
10+
- name: Schema migrations with Flyway
1311
href: flyway.md
14-
- name: Hibernate
15-
href: hibernate.md
12+
- name: Using Hibernate
13+
href: hibernate.md
14+
- name: Importing from JDBC data sources
15+
href: import-jdbc.md

0 commit comments

Comments
 (0)