Skip to content

Commit

Permalink
fix: the wrong description about full-text searching (GreptimeTeam#1074)
Browse files Browse the repository at this point in the history
Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
  • Loading branch information
killme2008 and fengjiachun committed Jul 17, 2024
1 parent 3d94423 commit b62a367
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 22 deletions.
11 changes: 6 additions & 5 deletions docs/nightly/en/user-guide/logs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,13 @@ From the table structure, you can see that the `origin_logs` table has only two
with the entire log message stored in a single column.
The `pipeline_logs` table stores the log message in multiple columns.

It is recommended to use the pipeline method to split the log message into multiple columns, which offers the advantage of explicitly querying a specific value within a certain column. Exact matching proves to be superior to fuzzy querying when handling strings for several key reasons:
It is recommended to use the pipeline method to split the log message into multiple columns, which offers the advantage of explicitly querying specific values within certain columns. Tag matching query proves superior to full-text searching for several key reasons:

- Performance Efficiency: By marking a column as a Tag in the pipeline, an inverted index is created on the column values, resulting in faster query execution compared to the full-text indexes used in fuzzy querying.
- Resource Consumption: Exact matching queries typically involve simpler comparisons and use fewer CPU, memory, and I/O resources compared to the more resource-intensive full-text indexes required for fuzzy querying.
- Accuracy: Exact matching returns precise results that strictly meet the query conditions, reducing the chances of irrelevant results, whereas fuzzy querying can still return more noise even with full-text indexing.
- Maintainability: Exact matching queries are straightforward and easier to understand, write, and debug, while fuzzy queries with full-text indexes add a layer of complexity, making them more challenging to optimize and maintain.
- **Performance Efficiency**: tag matching query is typically faster than full-text searching.
- **Resource Consumption**: Due to GreptimeDB's columnar storage engine, structured data is more conducive to compression. Additionally, the inverted index used for tag matching query typically consumes significantly fewer resources than a full-text index, especially in terms of storage size.
- **Maintainability**: tag matching query are straightforward and easier to understand, write, and debug.

Of course, if you need keyword searching within large text blocks, you must use full-text searching as it is specifically designed for that purpose.

## Query logs

Expand Down
13 changes: 7 additions & 6 deletions docs/nightly/zh/user-guide/logs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,13 +168,14 @@ DESC pipeline_logs;
从表结构中可以看到,`origin_logs` 表只有两列,整个日志消息存储在一个列中。
而 `pipeline_logs` 表将日志消息存储在多个列中。

推荐使用 pipeline 方法将日志消息拆分为多个列,这样可以明确查询某个特定列中的某个值
与模糊查询相比,精确匹配在处理字符串时具有以下几个关键优势
推荐使用 pipeline 方法将日志消息拆分为多个列,这样可以精确查询某个特定列中的某个值
与全文搜索相比,Tag 匹配查询在处理字符串时具有以下几个优势

- 性能效率:在 pipeline 中将列标记为 Tag 会基于该列的值创建一个倒排索引,从而实现比模糊查询中使用的全文索引更快的查询执行。
- 资源消耗:精确匹配查询通常涉及更简单的比较,并且使用的 CPU、内存和 I/O 资源较少,而模糊查询需要更多资源密集型的全文索引。
- 准确性:精确匹配返回严格符合查询条件的精确结果,减少了无关结果的可能性,而模糊查询即使使用全文索引仍然可能返回更多噪音。
- 可维护性:精确匹配查询简单直观,编写和调试更容易,而带有全文索引的模糊查询仍然增加了一层复杂性,使其更具挑战性,难以优化和维护。
- **性能效率**:Tag 的匹配查询通常都比全文搜索更快。
- **资源消耗**:由于 GreptimeDB 的存储引擎是列存,结构化的数据更利于数据的压缩,并且 Tag 匹配查询使用的倒排索引,其资源消耗通常显著少于全文索引,尤其是在存储大小方面。
- **可维护性**:精确匹配查询简单明了,更易于理解、编写和调试。

当然,如果需要在大段文本中进行关键词搜索,依然需要使用全文搜索,因为它就是专门为此设计。

## 查询日志

Expand Down
11 changes: 6 additions & 5 deletions docs/v0.9/en/user-guide/logs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,13 @@ From the table structure, you can see that the `origin_logs` table has only two
with the entire log message stored in a single column.
The `pipeline_logs` table stores the log message in multiple columns.

It is recommended to use the pipeline method to split the log message into multiple columns, which offers the advantage of explicitly querying a specific value within a certain column. Exact matching proves to be superior to fuzzy querying when handling strings for several key reasons:
It is recommended to use the pipeline method to split the log message into multiple columns, which offers the advantage of explicitly querying specific values within certain columns. Tag matching query proves superior to full-text searching for several key reasons:

- Performance Efficiency: By marking a column as a Tag in the pipeline, an inverted index is created on the column values, resulting in faster query execution compared to the full-text indexes used in fuzzy querying.
- Resource Consumption: Exact matching queries typically involve simpler comparisons and use fewer CPU, memory, and I/O resources compared to the more resource-intensive full-text indexes required for fuzzy querying.
- Accuracy: Exact matching returns precise results that strictly meet the query conditions, reducing the chances of irrelevant results, whereas fuzzy querying can still return more noise even with full-text indexing.
- Maintainability: Exact matching queries are straightforward and easier to understand, write, and debug, while fuzzy queries with full-text indexes add a layer of complexity, making them more challenging to optimize and maintain.
- **Performance Efficiency**: tag matching query is typically faster than full-text searching.
- **Resource Consumption**: Due to GreptimeDB's columnar storage engine, structured data is more conducive to compression. Additionally, the inverted index used for tag matching query typically consumes significantly fewer resources than a full-text index, especially in terms of storage size.
- **Maintainability**: tag matching query are straightforward and easier to understand, write, and debug.

Of course, if you need keyword searching within large text blocks, you must use full-text searching as it is specifically designed for that purpose.

## Query logs

Expand Down
13 changes: 7 additions & 6 deletions docs/v0.9/zh/user-guide/logs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,13 +168,14 @@ DESC pipeline_logs;
从表结构中可以看到,`origin_logs` 表只有两列,整个日志消息存储在一个列中。
而 `pipeline_logs` 表将日志消息存储在多个列中。

推荐使用 pipeline 方法将日志消息拆分为多个列,这样可以明确查询某个特定列中的某个值
与模糊查询相比,精确匹配在处理字符串时具有以下几个关键优势
推荐使用 pipeline 方法将日志消息拆分为多个列,这样可以精确查询某个特定列中的某个值
与全文搜索相比,Tag 匹配查询在处理字符串时具有以下几个优势

- 性能效率:在 pipeline 中将列标记为 Tag 会基于该列的值创建一个倒排索引,从而实现比模糊查询中使用的全文索引更快的查询执行。
- 资源消耗:精确匹配查询通常涉及更简单的比较,并且使用的 CPU、内存和 I/O 资源较少,而模糊查询需要更多资源密集型的全文索引。
- 准确性:精确匹配返回严格符合查询条件的精确结果,减少了无关结果的可能性,而模糊查询即使使用全文索引仍然可能返回更多噪音。
- 可维护性:精确匹配查询简单直观,编写和调试更容易,而带有全文索引的模糊查询仍然增加了一层复杂性,使其更具挑战性,难以优化和维护。
- **性能效率**:Tag 的匹配查询通常都比全文搜索更快。
- **资源消耗**:由于 GreptimeDB 的存储引擎是列存,结构化的数据更利于数据的压缩,并且 Tag 匹配查询使用的倒排索引,其资源消耗通常显著少于全文索引,尤其是在存储大小方面。
- **可维护性**:精确匹配查询简单明了,更易于理解、编写和调试。

当然,如果需要在大段文本中进行关键词搜索,依然需要使用全文搜索,因为它就是专门为此设计。

## 查询日志

Expand Down

0 comments on commit b62a367

Please sign in to comment.