Skip to content

Commit

Permalink
docs: add log query (#1052)
Browse files Browse the repository at this point in the history
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Yiran <cuiyiran3@gmail.com>
  • Loading branch information
zhongzc and nicecui authored Jul 12, 2024
1 parent bbe2608 commit 658eb3b
Show file tree
Hide file tree
Showing 2 changed files with 258 additions and 0 deletions.
129 changes: 129 additions & 0 deletions docs/nightly/en/user-guide/log/log-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Log Query

This document provides a guide on how to use GreptimeDB's query language for effective searching and analysis of log data.

## Overview

GreptimeDB allows for flexible querying of data using SQL statements. This section introduces specific search functions and query statements designed to enhance your log querying capabilities.

## Full-Text Search Using the `MATCHES` Function

In SQL statements, you can use the `MATCHES` function to perform full-text searches, which is especially useful for log analysis. The `MATCHES` function supports full-text searches on `String` type columns. Here’s an example of how it can be used:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'error OR fail');
```

The `MATCHES` function is designed for full-text search and accepts two parameters:

- `column_name`: The column to perform the full-text search on, which should contain textual data of type `String`.
- `search_query`: A string containing query statement which you want to search for. See the [Query Statements](#query-statements) section below for more details.

## Query Statements

### Simple Term

Simple term searches are straightforward:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'Barack Obama');
```

The value `Barack Obama` in the `search_query` parameter of the `MATCHES` function will be considered as two separate terms: `Barack` and `Obama`. This means the query will match all rows containing either `Barack` or `Obama`, equivalent to using `OR`:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'Barack OR Obama');
```

### Negative Term

Prefixing a term with `-` excludes rows containing that term. For instance, to find rows containing `apple` but not `fruit`:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'apple -fruit');
```

### Must Term

Prefixing a term with `+` specifies that it must be included in the results. For example, to query rows containing both `apple` and `fruit`:

```sql
SELECT * FROM logs WHERE MATCHES(message, '+apple +fruit');
```

### Boolean Operators

Boolean operators can specify logical conditions for the search. For example, the `AND` operator requires all specified terms to be included, while the `OR` operator requires at least one term to be included. The `AND` operator takes precedence over `OR`, so the expression `a AND b OR c` is interpreted as `(a AND b) OR c`. For example:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'a AND b OR c');
```

This matches rows containing both `a` and `b`, or rows containing `c`. Equivalent to:

```sql
SELECT * FROM logs WHERE MATCHES(message, '(+a +b) c');
```

### Phrase Term

A phrase term is enclosed within quotes `" "` and matches the exact sequence of words. For example, to match rows containing `Barack` followed directly by `Obama`:

```sql
SELECT * FROM logs WHERE MATCHES(message, '"Barack Obama"');
```

To include quotes within a phrase, use a backslash `\` to escape them:

```sql
SELECT * FROM logs WHERE MATCHES(message, '"He said \"hello\""');
```

## Full-Text Index for Accelerated Search

A full-text index is essential for full-text search, especially when dealing with large datasets. Without a full-text index, the search operation could be very slow, impacting the overall query performance and user experience. By configuring a full-text index within the Pipeline, you can ensure that search operations are performed efficiently, even with significant data volumes.

### Configuring Full-Text Index

In the Pipeline configuration, you can specify a column to use a full-text index. Below is a configuration example where the `message` column is set with a full-text index:

<!-- In the Pipeline configuration, you can [specify a column to use a full-text index](./log-pipeline.md#index-field). Below is a configuration example where the `message` column is set with a full-text index: -->

```yaml
processors:
- date:
field: time
formats:
- "%Y-%m-%d %H:%M:%S%.3f"
ignore_missing: true

transform:
- field: message
type: string
index: fulltext
- field: time
type: time
index: timestamp
```
### Viewing Table Schema
After data is written, you can use an SQL statement to view the table schema and confirm that the `message` column is set for full-text indexing:

```sql
SHOW CREATE TABLE many_logs\G
*************************** 1. row ***************************
Table: many_logs
Create Table: CREATE TABLE IF NOT EXISTS `many_logs` (
`host` STRING NULL,
`log` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
`ts` TIMESTAMP(9) NOT NULL,
TIME INDEX (`ts`),
PRIMARY KEY (`host`)
)

ENGINE=mito
WITH(
append_mode = 'true'
)
```
129 changes: 129 additions & 0 deletions docs/nightly/zh/user-guide/log/log-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# 日志查询

本文档介绍如何使用 GreptimeDB 提供的查询语言进行日志数据的搜索和分析。

## 查询概述

在 GreptimeDB 中,您可以通过 SQL 语句进行灵活的数据查询。本节将介绍如何使用特定的搜索函数和查询语句来优化您的日志查询。

## 使用 `MATCHES` 函数进行全文搜索

在 SQL 语句中,可以使用 `MATCHES` 函数来执行全文搜索,这对于日志分析尤其有用。`MATCHES` 函数支持对 `String` 类型的列进行全文搜索。以下是一个典型的使用示例:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'error OR fail');
```

`MATCHES` 是一个专门用于全文搜索的函数,它接受两个参数:

- `column_name`:要进行全文搜索的列,该列包含文本数据,列的数据类型必须是 `String`
- `search_query`:一个字符串,包含要搜索的关键词和操作符,详情请看下文中的[查询语句类型](#查询语句类型)

## 查询语句类型

### 简单词项 (Simple Term)

简单的搜索词如下:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'Barack Obama');
```

上述 `MATCHES` 中参数 `search_query` 的值 `Barack Obama` 将被视为 `Barack``Obama` 两个独立的词项,这意味着该查询将匹配包含 `Barack``Obama` 的所有行,等价于使用 `OR`

```sql
SELECT * FROM logs WHERE MATCHES(message, 'Barack OR Obama');
```

### 否定词项 (Negative Term)

通过在词项前加上 `-` 符号,可以排除包含某些词的行。例如,查询包含 `apple` 但不包含 `fruit` 的行:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'apple -fruit');
```

### 必需词项 (Must Term)

通过在词项前加上 `+` 符号,可以指定必须出现在搜索结果中的词项。例如,查询同时包含 `apple``fruit` 的行:

```sql
SELECT * FROM logs WHERE MATCHES(message, '+apple +fruit');
```

### 布尔操作符 (Boolean Operators)

布尔操作符能够指定搜索的条件逻辑。例如,`AND` 运算符要求搜索结果中同时包含多个词项,而 `OR` 运算符则要求结果中至少包含一个词项。在查询中,`AND` 运算符优先于 `OR` 运算符。因此,表达式 `a AND b OR c` 被解释为 `(a AND b) OR c`。例如:

```sql
SELECT * FROM logs WHERE MATCHES(message, 'a AND b OR c');
```

这意味着查询将匹配同时包含 `a``b` 的行,或者包含 `c` 的行。等价于:

```sql
SELECT * FROM logs WHERE MATCHES(message, '(+a +b) c');
```

### 短语词项 (Phrase Term)

使用引号 `" "` 包围的短语将作为整体进行匹配。例如,只匹配 `Barack` 后紧跟 `Obama` 的行:

```sql
SELECT * FROM logs WHERE MATCHES(message, '"Barack Obama"');
```

如果需要在短语中包含引号,可以使用反斜杠 `\` 进行转义:

```sql
SELECT * FROM logs WHERE MATCHES(message, '"He said \"hello\""');
```

## 全文索引加速搜索

全文索引是全文搜索的关键配置,尤其是在需要处理大量数据的搜索查询场景中。没有全文索引,搜索操作可能会非常缓慢,影响整体的查询性能和用户体验。通过在 Pipeline 中配置全文索引,可以确保搜索操作能够高效执行,即使是在数据量极大的情况下也能保持良好的性能。

### 配置全文索引

在 Pipeline 的配置中,可以指定某列使用全文索引。以下是一个配置示例,其中 `message` 列被设置为全文索引:

<!-- 在 Pipeline 的配置中,可以[指定某列使用全文索引](./log-pipeline.md#index-字段)。以下是一个配置示例,其中 `message` 列被设置为全文索引: -->

```yaml
processors:
- date:
field: time
formats:
- "%Y-%m-%d %H:%M:%S%.3f"
ignore_missing: true

transform:
- field: message
type: string
index: fulltext
- field: time
type: time
index: timestamp
```
### 查看表结构
在数据写入后,可以通过 SQL 命令查看表结构,确认 `message` 列已经被设置为全文索引:

```sql
SHOW CREATE TABLE many_logs\G
*************************** 1. row ***************************
Table: many_logs
Create Table: CREATE TABLE IF NOT EXISTS `many_logs` (
`host` STRING NULL,
`log` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
`ts` TIMESTAMP(9) NOT NULL,
TIME INDEX (`ts`),
PRIMARY KEY (`host`)
)

ENGINE=mito
WITH(
append_mode = 'true'
)
```

0 comments on commit 658eb3b

Please sign in to comment.