Skip to content

YDBDOCS-188 FluentBit - not just K8s #4635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 159 additions & 58 deletions ydb/docs/en/core/integrations/fluent-bit.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,147 @@
# Log records collection in a Kubernetes cluster using FluentBit and YDB
# Log records collection using FluentBit

This section presents the implementation of integration between the Kubernetes cluster log shipping tool - FluentBit, with subsequent saving for viewing or analysis in {{ ydb-short-name }}.
This section describes the integration between {{ ydb-short-name }} and the log capture tool [FluentBit](https://fluentbit.io) to save and analyze the log records in {{ ydb-short-name }}.

## Introduction

FluentBit is a tool that can collect text data, manipulate it (change, transform, merge) and send it to various storage facilities for further processing.
## Overview

To deploy a scheme for delivering logs of running applications to Kubernetes using FluentBit and then saving them in YDB, you need to:
FluentBit is a tool that can collect text data, manipulate it (modify, transform, combine), and send it to various repositories for further processing. A custom plugin library for FluentBit has been developed to support saving the log records into {{ ydb-short-name }}. The library's source code is available in the [fluent-bit-ydb repository](https://github.com/ydb-platform/fluent-bit-ydb).

* Create table in YDB
Deploying a log delivery scheme using FluentBit and {{ ydb-short-name }} as the destination database includes the following steps:

* Configure [FluentBit](https://fluentbit.io)
1. Create {{ ydb-short-name }} tables for the log data storage
2. Deploy FluentBit and {{ ydb-short-name }} plugin for FluentBit
3. Configure FluentBit to collect and process the logs
4. Configure FluentBit to send the logs to {{ ydb-short-name }} tables

* Deploy [FluentBit](https://fluentbit.io) in Kubernetes cluster using [HELM](https://helm.sh)

The work diagram looks like this:
## Creating tables for log data

![FluentBit in Kubernetes cluster](../_assets/fluent-bit-ydb-scheme.png)
<small>Figure 1 — Interaction diagram between FluentBit and YDB in the Kubernetes cluster</small>
Tables for log data storage must be created in the chosen {{ ydb-short-name }} database. The structure of the tables is determined by a set of fields of a specific log supplied using FluentBit. Depending on the requirements, different log types may be saved to different tables. Normally, the table for log data contains the following fields:

In this diagram:
* timestamp
* log level
* hostname
* service name
* message text or its semi-structural representation as JSON document

* Application pods write logs to stdout/stderr
{{ ydb-short-name }} tables must have a primary key, uniquely identifying each table's row. A timestamp does not always uniquely identify messages coming from a particular source because messages might be generated simultaneously. To enforce the uniqueness of the primary key, a hash value can be added to the table. The hash value is computed using the [CityHash64](https://github.com/google/cityhash) algorithm over the log record data.

* Text from stdout/stderr is saved as files on Kubernetes worker nodes
Row-based and columnar tables can both be used for log data storage. Columnar tables are recommended, as they support more efficient data scans for log data retrieval.

* Pod with FluentBit
Example of the row-based table for log data storage:

```sql
CREATE TABLE `fluent-bit/log` (
`timestamp` Timestamp NOT NULL,
`hostname` Text NOT NULL,
`input` Text NOT NULL,
`datahash` Uint64 NOT NULL,
`level` Text NULL,
`message` Text NULL,
`other` JsonDocument NULL,
PRIMARY KEY (
`datahash`, `timestamp`, `hostname`, `input`
)
);
```

Example of the columnar table for log data storage:

```sql
CREATE TABLE `fluent-bit/log` (
`timestamp` Timestamp NOT NULL,
`hostname` Text NOT NULL,
`input` Text NOT NULL,
`datahash` Uint64 NOT NULL,
`level` Text NULL,
`message` Text NULL,
`other` JsonDocument NULL,
PRIMARY KEY (
`timestamp`, `hostname`, `input`, `datahash`
)
) PARTITION BY HASH(`timestamp`, `hostname`, `input`)
WITH (STORE = COLUMN);
```

The command that creates the columnar table differs in the following details:

* it specifies the columnar storage type and the table's partitioning key in the last two lines;
* the `timestamp` column is the first column of the primary key, which is optimal and recommended for columnar, but not for row-based tables. See the specific guidelines for choosing the primary key [for columnar tables](../dev/primary-key/column-oriented.md) and [for row-based tables](../dev/primary-key/row-oriented.md).

[TTL configuration](../concepts/ttl.md) can be optionally applied to the table, limiting the data storage period and enabling the automatic removal of obsolete data. Enabling TTL requires an extra setting in the `WITH` section of the table creation command. For example, `TTL = Interval("P14D") ON timestamp` sets the storage period to 14 days, based on the `timestamp` field's value.

* Mounts a folder with log files for itself

* Reads the contents from them
## FluentBit deployment and configuration

* Enriches posts with additional metadata
FluentBit deployment should be performed according to [its documentation](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit).

* Saves records to YDB cluster
{{ ydb-short-name }} plugin for FluentBit is available in the source code form in the [repository](https://github.com/ydb-platform/fluent-bit-ydb), along with the build instructions. A docker image is provided for container-based deployments: `ghcr.io/ydb-platform/fluent-bit-ydb`.

## Creating a table in YDB
General logic, configuration syntax and procedures to set up the receiving, processing, and delivering logs in the FluentBit environment are defined in the corresponding [FluentBit documentation](https://docs.fluentbit.io/manual/concepts/key-concepts).

On the selected YDB cluster, you need to run the following query:

## Writing logs to {{ ydb-short-name }} tables

Before using the {{ ydb-short-name }} output plugin, it needs to be enabled in the FluentBit settings. The list of the enabled FluentBit plugins is configured in a file (for example, `plugins.conf`), which is referenced through the `plugins_file` parameter in the `SERVICE` section of the main FluentBit configuration file. Below is the example of such a file with {{ ydb-short-name }} plugin enabled (plugin library path may be different depending on your setup):

```text
# plugins.conf
[PLUGINS]
Path /usr/lib/fluent-bit/out_ydb.so
```

The table below lists the configuration parameters supported by the {{ ydb-short-name }} output plugin for FluentBit.

| **Parameter** | **Description** |
|----------|--------------|
| `Name` | Plugin type, should be value `ydb` |
| `Match` | (optional) [Tag matching expression](https://docs.fluentbit.io/manual/concepts/data-pipeline/router) to select log records which should be routed to {{ ydb-short-name }} |
| `ConnectionURL` | YDB connection URL, including the protocol, endpoint, and database path (see the [documentation](../concepts/connect.md)) |
| `TablePath` | Table path starting from database root (example: `fluent-bit/log`) |
| `Columns` | JSON structure mapping the fields of FluentBit record to the columns of the target YDB table. May include the pseudo-columns listed below |
| `CredentialsAnonymous` | Configured as `1` for anonymous {{ ydb-short-name }} authentication |
| `CredentialsToken` | Token value, to use the token authentication {{ ydb-short-name }} mode |
| `CredentialsYcMetadata` | Configure as `1` for virtual machine metadata {{ ydb-short-name }} authentication |
| `CredentialsStatic` | Username and password for {{ ydb-short-name }} authentication, specified in the following format: `username:password@` |
| `CredentialsYcServiceAccountKeyFile` | Path of a file containing the service account (SA) key, to use the SA key {{ ydb-short-name }} authentication |
| `CredentialsYcServiceAccountKeyJson` | JSON data of the service account key to be used instead of the filename (useful in K8s environment) |
| `Certificates` | Path to the certificate authority (CA) trusted certificates file, or the literal trusted CA certificate value |
| `LogLevel` | Plugin-specific logging level should be one of `disabled` (default), `trace`, `debug`, `info`, `warn`, `error`, `fatal` or `panic` |

The following pseudo-columns are available, in addition to the actual FluentBit log record fields, to be used as source values in the column map (`Columns` parameter):

* `.timestamp` - log record timestamp (mandatory)
* `.input` - log input stream name (mandatory)
* `.hash` - uint64 hash code, computed over the log record fields (optional)
* `.other` - JSON document containing all log record fields that were not explicitly mapped to any table column (optional)

Example of `Columns` parameter value:

```json
{".timestamp": "timestamp", ".input": "input", ".hash": "datahash", "log": "message", "level": "level", "host": "hostname", ".other": "other"}
```

## Collecting logs in a Kubernetes cluster

FluentBit is often used to collect logs in the Kubernetes environment. Below is the schema of the log delivery process, implemented using FluentBit and {{ ydb-short-name }}, for applications running in the Kubernetes cluster:

![FluentBit in Kubernetes cluster](../_assets/fluent-bit-ydb-scheme.png)
<small>Figure 1 — Interaction diagram between FluentBit and {{ ydb-short-name }} in the Kubernetes cluster</small>

In this diagram:

* Application pods write logs to stdout/stderr
* Text from stdout/stderr is saved as files on Kubernetes worker nodes
* Pod with FluentBit:
* Mounts a folder with log files for itself
* Reads the contents from the log files
* Enriches log records with additional metadata
* Saves records to {{ ydb-short-name }} database

### Table to store Kubernetes logs

Below is the {{ ydb-short-name }} table structure to store the Kubernetes logs:

```sql
CREATE TABLE `fluent-bit/log` (
Expand All @@ -52,38 +156,35 @@ CREATE TABLE `fluent-bit/log` (
PRIMARY KEY (
`timestamp`, `file`, `datahash`
)
)
) PARTITION BY HASH(`timestamp`, `file`)
WITH (STORE = COLUMN, TTL = Interval("P14D") ON `timestamp`);
```

Column purpose:

* timestamp – the log timestamp

* file – name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written
Columns purpose:

* pipe – stdout or stderr stream where application-level writing was done
* `timestamp` – the log record timestamp;
* `file` – the name of the source from which the log was read. In the case of Kubernetes, this will be the name of the file on the worker node in which the logs of a specific pod are written;
* `pipe` – stdout or stderr stream where application-level writing was done;
* `datahash` – hash code computed over the log record;
* `message` – the textual part of the log record;
* `message_parsed` – log record fields in the structured form, if it could be parsed using the configured FluentBit parsers from the textual part;
* `kubernetes` – information about the pod, including name, namespace, and annotations.

* message – the log message
Optionally, TTL can be configured for the table, as shown in the example.

* datahash – the CityHash64 hash code calculated over the log message (required to avoid overwriting messages from the same source and with the same timestamp)
### FluentBit configuration

* message_parsed – a structured log message, if it could be parsed using the fluent-bit parsers
In order to deploy FluentBit in the Kubernetes environment, a configuration file with the log collection and processing parameters must be prepared (typical file name: `values.yaml`). This section provides the necessary comments on this file's content with the examples.

* kubernetes – information about the pod, for example: name, namespace, logs and annotations

Optionally, you can set TTL for table rows

## FluentBit configuration

It is necessary to replace the repository and image version:
It is necessary to replace the repository and image version of the FluentBit container:

```yaml
image:
repository: ghcr.io/ydb-platform/fluent-bit-ydb
tag: latest
```

In this image, a plugin library has been added that implements YDB support. Source code is available [here](https://github.com/ydb-platform/fluent-bit-ydb)
In this image, a plugin library has been added that implements {{ ydb-short-name }} support.

The following lines define the rules for mounting log folders in FluentBit pods:

Expand Down Expand Up @@ -115,7 +216,7 @@ daemonSetVolumeMounts:
readOnly: true
```

Also, you need to redefine the command and launch arguments:
FluentBit startup parameters should be configured as shown below:

```yaml
command:
Expand All @@ -127,7 +228,7 @@ args:
- --config=/fluent-bit/etc/conf/fluent-bit.conf
```

And the pipeline itself for collecting, converting and delivering logs:
The FluentBit pipeline for collecting, converting, and delivering logs should be defined according to the example:

```yaml
config:
Expand Down Expand Up @@ -166,15 +267,13 @@ config:
CredentialsToken ${OUTPUT_YDB_CREDENTIALS_TOKEN}
```

Blocks description:
Configuration blocks description:

* Inputs. This block specifies where to read and how to parse logs. In this case, *.log files will be read from the /var/log/containers/ folder, which was mounted from the host
* `inputs` - this block specifies where to read and how to parse logs. In this case, `*.log` files will be read from the `/var/log/containers/` folder, which is mounted from the host
* `filters` - this block specifies how the logs will be processed. In this case, for each log record, the corresponding metadata is added (using the Kubernetes filter), and unused fields (`_p`, `time`) are cut out
* `outputs` - this block specifies where the logs will be sent. In this case, logs are saved into the `fluent-bit/log` table in the {{ ydb-short-name }} database. Database connection parameters (in the shown example, `ConnectionURL` and `CredentialsToken`) are defined using the environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`. Authentication parameters and the set of corresponding environment variables are updated depending on the configuration of the {{ ydb-short-name }} cluster being used.

* Filters. This block specifies how logs will be processed. In this case: for each log the corresponding metadata will be found (using the kubernetes filter), and unused fields (_p, time) will be cut out

* Outputs. This block specifies where the logs will be sent. In this case, to the `fluent-bit/log` table in the {{ ydb-short-name }} cluster. Cluster connection parameters (ConnectionURL, CredentialsToken) are set using the corresponding environment variables – `OUTPUT_YDB_CONNECTION_URL`, `OUTPUT_YDB_CREDENTIALS_TOKEN`

Environment variables are defined as follows:
Environment variables are defined as shown below:

```yaml
env:
Expand All @@ -187,21 +286,21 @@ env:
name: fluent-bit-ydb-plugin-token
```

The secret authorization token must be created in advance in the cluster. For example, using the command:
Authentication data should be stored as the secret object in the Kubernetes cluster configuration. Example command to create a Kubernetes secret:

```sh
kubectl create secret -n ydb-fluent-bit-integration generic fluent-bit-ydb-plugin-token --from-literal=token=<YDB TOKEN>
```

## FluentBit deployment
### Deploying FluentBit in a Kubernetes cluster

HELM is a way to package and install applications in a Kubernetes cluster. To deploy FluentBit, you need to add a chart repository using the command:
[HELM](https://helm.sh) is a tool to package and install applications in a Kubernetes cluster. To deploy FluentBit, the corresponding chart repository (containing the installation scenario) should be added using the following command:

```sh
helm repo add fluent https://fluent.github.io/helm-charts
```

Installing FluentBit on a Kubernetes cluster is done using the following command:
After that, FluentBit can be deployed to a Kubernetes cluster with the following command:

```sh
helm upgrade --install fluent-bit fluent/fluent-bit \
Expand All @@ -211,29 +310,31 @@ helm upgrade --install fluent-bit fluent/fluent-bit \
--values values.yaml
```

## Verify the installation
The argument `--values` in the example command shown above references the file containing the FluentBit settings.

### Installation verification

Check that fluent-bit has started by reading its logs (there should be no [error] level entries):
Check that FluentBit has started by reading its logs (there should be no `[error]` level entries):

```sh
kubectl logs -n ydb-fluent-bit-integration -l app.kubernetes.io/instance=fluent-bit
```

Check that there are records in the YDB table (they will appear approximately a few minutes after launching FluentBit):
Check that there are records in the {{ ydb-short-name }} table (they will appear approximately a few minutes after launching FluentBit):

```sql
SELECT * FROM `fluent-bit/log` LIMIT 10 ORDER BY `timestamp` DESC
```

## Resource cleanup
### Resources cleanup

It is enough to remove the namespace with fluent-bit:
To remove FluentBit, it is sufficient to delete the Kubernetes namespace which was used for the installation:

```sh
kubectl delete namespace ydb-fluent-bit-integration
```

And a table with logs:
After uninstalling FluentBit, the log storage table can be dropped from the {{ ydb-short-name }} database:

```sql
DROP TABLE `fluent-bit/log`
Expand Down
14 changes: 7 additions & 7 deletions ydb/docs/en/core/integrations/toc_i.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
items:
- name: Log records collection in a Kubernetes cluster using FluentBit and YDB
- name: Log records collection using FluentBit
href: fluent-bit.md
- name: Grafana data source
href: grafana.md
- name: Schema migrations with goose
href: goose.md
- name: Importing from JDBC data sources
href: import-jdbc.md
- name: Liquibase
- name: Schema migrations with Liquibase
href: liquibase.md
- name: Flyway
- name: Schema migrations with Flyway
href: flyway.md
- name: Hibernate
href: hibernate.md
- name: Using Hibernate
href: hibernate.md
- name: Importing from JDBC data sources
href: import-jdbc.md
Loading
Loading