Skip to content

Commit

Permalink
[Improvement](docs) Update EN doc (apache#9228)
Browse files Browse the repository at this point in the history
  • Loading branch information
Gabriel39 authored and minghong.zhou committed May 6, 2022
1 parent 64aa789 commit f0076f9
Show file tree
Hide file tree
Showing 38 changed files with 696 additions and 696 deletions.
8 changes: 4 additions & 4 deletions docs/en/administrator-guide/block-rule/sql-block.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ Support SQL block rule by user level:

SQL block rule CRUD
- create SQL block rule
- sqlRegex patternSpecial characters need to be translated, "NULL" by default
- sql: Regex pattern,Special characters need to be translated, "NULL" by default
- sqlHash: Sql hash value, Used to match exactly, We print it in fe.audit.log, This parameter is the only choice between sql and sql, "NULL" by default
- partition_num: Max number of partitions will be scanned by a scan node, 0L by default
- tablet_num: Max number of tablets will be scanned by a scan node, 0L by default
- cardinality: An inaccurate number of scan rows of a scan node, 0L by default
- global: Whether global(all users)is in effect, false by default
- enableWhether to enable block ruletrue by default
- enable: Whether to enable block rule,true by default
```sql
CREATE SQL_BLOCK_RULE test_rule
PROPERTIES(
Expand All @@ -70,7 +70,7 @@ CREATE SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "30", "cardinality
```sql
SHOW SQL_BLOCK_RULE [FOR RULE_NAME]
```
- alter SQL block ruleAllows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
- sql and sqlHash cannot be set both. It means if sql or sqlHash is set in a rule, another property will never be allowed to be altered
- sql/sqlHash and partition_num/tablet_num/cardinality cannot be set together. For example, partition_num is set in a rule, then sql or sqlHash will never be allowed to be altered.
```sql
Expand All @@ -81,7 +81,7 @@ ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","en
ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true")
```

- drop SQL block ruleSupport multiple rules, separated by `,`
- drop SQL block rule,Support multiple rules, separated by `,`
```sql
DROP SQL_BLOCK_RULE test_rule1,test_rule2
```
Expand Down
10 changes: 5 additions & 5 deletions docs/en/administrator-guide/bucket-shuffle-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ under the License.

Bucket Shuffle Join is a new function officially added in Doris 0.14. The purpose is to provide local optimization for some join queries to reduce the time-consuming of data transmission between nodes and speed up the query.

It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394)
It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394).

## Noun Interpretation

Expand All @@ -40,7 +40,7 @@ It's design, implementation can be referred to [ISSUE 4394](https://github.com/a
## Principle
The conventional distributed join methods supported by Doris is: `Shuffle Join, Broadcast Join`. Both of these join will lead to some network overhead.

For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows:
* **Broadcast Join**: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent to the three HashJoinNode. Its network overhead is `3B `, and its memory overhead is `3B`.
* **Shuffle Join**: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash calculation, so its network overhead is `A + B` and memory overhead is `B`.

Expand All @@ -50,9 +50,9 @@ The data distribution information of each Doris table is saved in FE. If the joi

The picture above shows how the Bucket Shuffle Join works. The SQL query is A table join B table. The equivalent expression of join hits the data distribution column of A. According to the data distribution information of table A. Bucket Shuffle Join sends the data of table B to the corresponding data storage and calculation node of table A. The cost of Bucket Shuffle Join is as follows:

* network cost ``` B < min(3B, A + B) ```
* network cost: ``` B < min(3B, A + B) ```

* memory cost ``` B <= min(3B, B) ```
* memory cost: ``` B <= min(3B, B) ```

Therefore, compared with Broadcast Join and Shuffle Join, Bucket shuffle join has obvious performance advantages. It reduces the time-consuming of data transmission between nodes and the memory cost of join. Compared with Doris's original join method, it has the following advantages

Expand Down Expand Up @@ -91,7 +91,7 @@ You can use the `explain` command to check whether the join is a Bucket Shuffle
| | equal join conjunct: `test`.`k1` = `baseall`.`k1`
```

The join type indicates that the join method to be used is`BUCKET_SHUFFLE`
The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`.

## Planning rules of Bucket Shuffle Join

Expand Down
Loading

0 comments on commit f0076f9

Please sign in to comment.