Skip to content

Commit

Permalink
[typo](doc) Add the description of json HDFS broker load (apache#13683)
Browse files Browse the repository at this point in the history
Add the instruction of HDFS broker load with json format file.
  • Loading branch information
BePPPower authored Oct 27, 2022
1 parent d2262bc commit 3e8cd0c
Show file tree
Hide file tree
Showing 2 changed files with 109 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ WITH BROKER broker_name
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]
````
- `[MERGE|APPEND|DELETE]`
Expand Down Expand Up @@ -128,6 +129,10 @@ WITH BROKER broker_name
Tables only for the Unique Key model. Used to specify the column in the imported data that represents the Sequence Col. Mainly used to ensure data order when importing.
- `PROPERTIES ("key1"="value1", ...)`
Specify some parameters of the imported format. For example, if the imported file is in `json` format, you can specify parameters such as `json_root`, `jsonpaths`, `fuzzy parse`, etc.
- `WITH BROKER broker_name`
Specify the Broker service name to be used. In the public cloud Doris. Broker service name is `bos`
Expand Down Expand Up @@ -405,6 +410,55 @@ WITH BROKER broker_name
`my_table` must be an Unqiue Key model table with Sequence Col specified. The data will be ordered according to the value of the `source_sequence` column in the source data.
10. Import a batch of data from HDFS, specify the file format as `json`, and specify parameters of `json_root` and `jsonpaths`.
```sql
LOAD LABEL example_db.label10
(
DATA INFILE("HDFS://test:port/input/file.json")
INTO TABLE `my_table`
FORMAT AS "json"
PROPERTIES(
"json_root" = "$.item",
"jsonpaths" = "[$.id, $.city, $.code]"
)
)
with HDFS (
"hadoop.username" = "user"
"password" = ""
)
PROPERTIES
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
```
`jsonpaths` can be use with `column list` and `SET(column_mapping)`:
```sql
LOAD LABEL example_db.label10
(
DATA INFILE("HDFS://test:port/input/file.json")
INTO TABLE `my_table`
FORMAT AS "json"
(id, code, city)
SET (id = id * 10)
PROPERTIES(
"json_root" = "$.item",
"jsonpaths" = "[$.id, $.code, $.city]"
)
)
with HDFS (
"hadoop.username" = "user"
"password" = ""
)
PROPERTIES
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
```
### Keywords
BROKER, LOAD
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ WITH BROKER broker_name
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]
```

- `[MERGE|APPEND|DELETE]`
Expand Down Expand Up @@ -128,6 +129,10 @@ WITH BROKER broker_name

仅针对 Unique Key 模型的表。用于指定导入数据中表示 Sequence Col 的列。主要用于导入时保证数据顺序。

- `PROPERTIES ("key1"="value1", ...)`

指定导入的format的一些参数。如导入的文件是`json`格式,则可以在这里指定`json_root``jsonpaths``fuzzy_parse`等参数。

- `WITH BROKER broker_name`

指定需要使用的 Broker 服务名称。在公有云 Doris 中。Broker 服务名称为 `bos`
Expand Down Expand Up @@ -404,6 +409,56 @@ WITH BROKER broker_name

`my_table` 必须是 Unqiue Key 模型表,并且指定了 Sequcence Col。数据会按照源数据中 `source_sequence` 列的值来保证顺序性。

10. 从 HDFS 导入一批数据,指定文件格式为 `json` 并指定 `json_root``jsonpaths`

```sql
LOAD LABEL example_db.label10
(
DATA INFILE("HDFS://test:port/input/file.json")
INTO TABLE `my_table`
FORMAT AS "json"
PROPERTIES(
"json_root" = "$.item",
"jsonpaths" = "[$.id, $.city, $.code]"
)
)
with HDFS (
"hadoop.username" = "user"
"password" = ""
)
PROPERTIES
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
```

`jsonpaths` 可与 `column list``SET (column_mapping)`配合:

```sql
LOAD LABEL example_db.label10
(
DATA INFILE("HDFS://test:port/input/file.json")
INTO TABLE `my_table`
FORMAT AS "json"
(id, code, city)
SET (id = id * 10)
PROPERTIES(
"json_root" = "$.item",
"jsonpaths" = "[$.id, $.code, $.city]"
)
)
with HDFS (
"hadoop.username" = "user"
"password" = ""
)
PROPERTIES
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
```

### Keywords

BROKER, LOAD
Expand Down

0 comments on commit 3e8cd0c

Please sign in to comment.