From 7fff925a31df93ab334b582dc70ac79aae1a4c35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Wed, 15 Jan 2025 09:55:17 +0800 Subject: [PATCH 01/71] [Doc] Files() with Regex (#55078) --- .../sql-functions/table-functions/files.md | 12 +++++++++++- .../sql-functions/table-functions/files.md | 12 +++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/docs/en/sql-reference/sql-functions/table-functions/files.md b/docs/en/sql-reference/sql-functions/table-functions/files.md index dfc70a519602ee..f6926266d2f65c 100644 --- a/docs/en/sql-reference/sql-functions/table-functions/files.md +++ b/docs/en/sql-reference/sql-functions/table-functions/files.md @@ -43,7 +43,17 @@ All parameters are in the `"key" = "value"` pairs. #### data_location -The URI used to access the files. You can specify a path or a file. +The URI used to access the files. + +You can specify a path or a file. For example, you can specify this parameter as `"hdfs://:/user/data/tablename/20210411"` to load a data file named `20210411` from the path `/user/data/tablename` on the HDFS server. + +You can also specify this parameter as the save path of multiple data files by using wildcards `?`, `*`, `[]`, `{}`, or `^`. For example, you can specify this parameter as `"hdfs://:/user/data/tablename/*/*"` or `"hdfs://:/user/data/tablename/dt=202104*/*"` to load the data files from all partitions or only `202104` partitions in the path `/user/data/tablename` on the HDFS server. + +:::note + +Wildcards can also be used to specify intermediate paths. + +::: - To access HDFS, you need to specify this parameter as: diff --git a/docs/zh/sql-reference/sql-functions/table-functions/files.md b/docs/zh/sql-reference/sql-functions/table-functions/files.md index e0f032d8f32a65..e8e0ebec4601ba 100644 --- a/docs/zh/sql-reference/sql-functions/table-functions/files.md +++ b/docs/zh/sql-reference/sql-functions/table-functions/files.md @@ -42,7 +42,17 @@ FILES( data_location , [data_format] [, schema_detect ] [, StorageCredentialPara #### data_location -用于访问文件的 URI。可以指定路径或文件名。 +用于访问文件的 URI。 + +可以指定路径或文件名。例如,通过指定 `"hdfs://:/user/data/tablename/20210411"` 可以匹配 HDFS 服务器上 `/user/data/tablename` 目录下名为 `20210411` 的数据文件。 + +您也可以用通配符指定导入某个路径下所有的数据文件。FILES 支持如下通配符:`?`、`*`、`[]`、`{}` 和 `^`。例如, 通过指定 `"hdfs://:/user/data/tablename/*/*"` 路径可以匹配 HDFS 服务器上 `/user/data/tablename` 目录下所有分区内的数据文件,通过 `"hdfs://:/user/data/tablename/dt=202104*/*"` 路径可以匹配 HDFS 服务器上 `/user/data/tablename` 目录下所有 `202104` 分区内的数据文件。 + +:::note + +中间的目录也可以使用通配符匹配。 + +::: - 要访问 HDFS,您需要将此参数指定为: From 79a03f9067cd9e886168c9533e0ed5ff2b502a78 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Wed, 15 Jan 2025 10:33:28 +0800 Subject: [PATCH 02/71] [Doc] Remove incorrect description (#55062) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: 絵空事スピリット --- docs/en/sql-reference/sql-functions/table-functions/files.md | 2 +- docs/zh/sql-reference/sql-functions/table-functions/files.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/en/sql-reference/sql-functions/table-functions/files.md b/docs/en/sql-reference/sql-functions/table-functions/files.md index f6926266d2f65c..0bbda3326a7b38 100644 --- a/docs/en/sql-reference/sql-functions/table-functions/files.md +++ b/docs/en/sql-reference/sql-functions/table-functions/files.md @@ -220,7 +220,7 @@ The system unionizes the schema of Parquet and ORC files based on the column nam ##### Infer STRUCT type from Parquet -From v3.4.0 onwards, FILES() supports inferring the STRUCT type data from Parquet files. Although Parquet file itself does not support the STRUCT type, the system can infer STRUCT and nested STRUCT values from the STRING type column of the file. +From v3.4.0 onwards, FILES() supports inferring the STRUCT type data from Parquet files. #### StorageCredentialParams diff --git a/docs/zh/sql-reference/sql-functions/table-functions/files.md b/docs/zh/sql-reference/sql-functions/table-functions/files.md index e8e0ebec4601ba..64c4711cc78997 100644 --- a/docs/zh/sql-reference/sql-functions/table-functions/files.md +++ b/docs/zh/sql-reference/sql-functions/table-functions/files.md @@ -219,7 +219,7 @@ FILES() 的 Schema 检测并不是完全严格的。例如,在读取 CSV 文 ##### 推断 Parquet 文件中的 STRUCT 类型 -从 v3.4.0 版本开始,FILES() 支持从 Parquet 文件中推断 STRUCT 类型的数据。尽管 Parquet 文件本身不支持 STRUCT 类型,但系统可以从文件中的 STRING 类型列推断 STRUCT 及嵌套 STRUCT 值。 +从 v3.4.0 版本开始,FILES() 支持从 Parquet 文件中推断 STRUCT 类型的数据。 #### StorageCredentialParams From 6e3d945bb492c47e82d00fa8d701cd6e4dd2f510 Mon Sep 17 00:00:00 2001 From: Dan Roscigno Date: Tue, 14 Jan 2025 21:54:19 -0500 Subject: [PATCH 03/71] [Doc] use package.json from main (#55083) Signed-off-by: DanRoscigno --- .github/workflows/ci-doc-checker.yml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.github/workflows/ci-doc-checker.yml b/.github/workflows/ci-doc-checker.yml index 1adaad950a1ee0..e03210602e8490 100644 --- a/.github/workflows/ci-doc-checker.yml +++ b/.github/workflows/ci-doc-checker.yml @@ -130,6 +130,11 @@ jobs: rm -rf ./docs/release_notes ./docs/ecosystem_release mv ../zh ./i18n/zh/docusaurus-plugin-content-docs/current rm -rf ./i18n/zh/docusaurus-plugin-content-docs/current/release_notes ./i18n/zh/docusaurus-plugin-content-docs/current/ecosystem_release + # Using package.json and yarn.lock from a PR is not safe, so copy from main branch. + rm package.json + rm yarn.lock + curl -O https://raw.githubusercontent.com/StarRocks/starrocks/refs/heads/main/docs/docusaurus/package.json + curl -O https://raw.githubusercontent.com/StarRocks/starrocks/refs/heads/main/docs/docusaurus/yarn.lock yarn install --frozen-lockfile yarn clear yarn build From 2533f88f812a4384f0c56b291db2ec4898b79adf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E7=B5=B5=E7=A9=BA=E4=BA=8B=E3=82=B9=E3=83=94=E3=83=AA?= =?UTF-8?q?=E3=83=83=E3=83=88?= Date: Wed, 15 Jan 2025 11:00:22 +0800 Subject: [PATCH 04/71] [Doc] Add links to release notes (#55050) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: 絵空事スピリット --- docs/en/release_notes/release-3.4.md | 28 ++++++++++++++-------------- docs/zh/release_notes/release-3.4.md | 28 ++++++++++++++-------------- 2 files changed, 28 insertions(+), 28 deletions(-) diff --git a/docs/en/release_notes/release-3.4.md b/docs/en/release_notes/release-3.4.md index 4e1f6770ac4942..e69beb57883830 100644 --- a/docs/en/release_notes/release-3.4.md +++ b/docs/en/release_notes/release-3.4.md @@ -11,12 +11,12 @@ Release date: January 13, 2025 ### Data Lake Analytics - Optimized Iceberg V2 query performance and lowered memory usage by reducing repeated reads of delete-files. -- Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. +- Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. For more information, see [Delta Lake catalog - Feature support](https://docs.starrocks.io/docs/data_source/catalog/deltalake_catalog/#feature-support). - Data Cache related improvements: - - Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. - - Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. + - Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. For more information, see [Data Cache - Cache replacement policies](https://docs.starrocks.io/docs/data_source/data_cache/#cache-replacement-policies). + - Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. For more information, see [Data Cache](https://docs.starrocks.io/docs/using_starrocks/caching/block_cache/). - Provides an adaptive I/O strategy optimization for Data Cache, which flexibly routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput. -- Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. +- Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. For more information, see [Query-triggered collection](https://docs.starrocks.io/docs/using_starrocks/Cost_based_optimizer/#query-triggered-collection).