Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update workload document #1407

Closed
wants to merge 98 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
a665b9d
Update load-manual.md (#1339)
Ruffianjiang Nov 19, 2024
59ffabb
[Refactor](data-partition) Refactor data partition docs (#1364)
zclllyybb Nov 19, 2024
2b574c6
[download] Fix 2.0.15 download link (#1374)
KassieZ Nov 19, 2024
768e146
[docs](typo) Version mgt and fix typo (#1362)
KassieZ Nov 19, 2024
937309d
[Fix] Update remote-storage.md (#1247)
intelligentfu8 Nov 19, 2024
0c17c70
Update install-env.md (#1269)
wangtianyi2004 Nov 19, 2024
817edf4
Update install-env.md (#1270)
wangtianyi2004 Nov 19, 2024
592a5d6
[doc] fix dead link (#1317)
sdhzwc Nov 19, 2024
4d8c7c5
[doc] add datalake tutorial of lakesoul (#1320)
Ceng23333 Nov 19, 2024
b39204b
[Docs] Add doc for variant nested type (#1328)
eldenmoon Nov 19, 2024
5b18007
[blog](update) Update blog and fix typo (#1381)
KassieZ Nov 21, 2024
0d17212
[doc](refactor) update k8s install operator doc (#1369)
intelligentfu8 Nov 21, 2024
8abd63d
[doc](function) support table-function posexplode (#1283)
zhangstar333 Nov 21, 2024
71b86fe
[doc] fix link error of lakesoul tutorial (#1384)
Ceng23333 Nov 21, 2024
7cc3026
[doc](routine load) fix routine load doc error (#1382)
sollhui Nov 22, 2024
5a496a2
Removed useless workflow: deadlink checker (#1394)
yang1666204 Nov 23, 2024
e04f3ee
[community] Update announcement (#1398)
KassieZ Nov 23, 2024
b0e8dff
[fix] fix cron deploy1 (#1400)
KassieZ Nov 23, 2024
a07046f
Update:add event Doris Summit Asia 2024 (#1395)
yang1666204 Nov 25, 2024
99a2a9e
[doc](dbt)add dbt doc example (#1387)
catpineapple Nov 25, 2024
fea6ece
[doc](fixed) selectdb to apache (#1402)
intelligentfu8 Nov 25, 2024
6e7aee7
Update build-check.yml (#1401)
jeffreys-cat Nov 25, 2024
a1f4912
[fix] Update Doris Summit Asia (#1405)
KassieZ Nov 25, 2024
41a58e8
[mod] modify the announcement (#1406)
morningman Nov 25, 2024
1aa0683
Add restore_download_snapshot_batch_size/backup_upload_snapshot_batch…
w41ter Nov 27, 2024
8de4b3a
[blog] Update minimax blog and all release (#1414)
KassieZ Nov 27, 2024
b15ca03
[doc](udf) refactor the java-udf function doc (#1383)
zhangstar333 Nov 27, 2024
7b68672
[doc] add doc of `show backend config` (#1306)
DarvenDuan Nov 27, 2024
5623288
[Enhancement] replace table options to keep original dropped table in…
Vallishp Nov 27, 2024
7d9903a
[web]Update the version selection on the upgrade page (#1326)
yang1666204 Nov 27, 2024
14b6fac
Update high-concurrent-point-query.md (#1329)
wangtianyi2004 Nov 27, 2024
13b4769
[doc](percentile) update percentile/percentile_array doc as now suppo…
zhangstar333 Nov 27, 2024
be32821
[Chinese] update join hint (#1375)
LiBinfeng-01 Nov 28, 2024
c2bc502
[fix] Fix crondeploy by Java UDF Docs Format (#1418)
KassieZ Nov 28, 2024
8fe409f
[improvement] improve docs for ccr (#1419)
dataroaring Nov 28, 2024
55671ec
update row storage docs (#1413)
xiaokang Nov 28, 2024
cc975e0
[opt](show) update show collation sql reference manual (#1410)
morrySnow Nov 28, 2024
0295f70
[update] add collaborators (#1421)
KassieZ Nov 28, 2024
6d51f24
[doc] add trim_in function docs (#1207)
liujiwen-up Nov 28, 2024
1b812a6
Update quick-start.md (#1345)
Ruffianjiang Nov 28, 2024
ce70cda
增加端口规划,客户端需要8040的端口的网络连通。 (#1377)
zyszys-max Nov 28, 2024
ec83e9d
[improvement](ccr) improve overview of ccr (#1427)
dataroaring Dec 2, 2024
41055be
[doc](update) modify the example for UPDATE usage (#1412)
zhannngchen Dec 3, 2024
0380821
[doc]modify wrongly written character in release-2.1.4.md (#1432)
ixzc Dec 3, 2024
3504dc4
[Fix](CI) ignore markdown file suffix changes in deadlink check (#1447)
zclllyybb Dec 4, 2024
11f7c83
[release] Update 3.0.3 release note (#1451)
KassieZ Dec 4, 2024
8671466
[update] update banner link (#1456)
KassieZ Dec 5, 2024
a810f25
[Feature] Add map function docs which implemented for long (#1454)
zclllyybb Dec 5, 2024
d25cd71
[Enhancement](function) Add more detail explanation about approx_coun…
zclllyybb Dec 5, 2024
74de2f7
feat:docs (#1461)
yang1666204 Dec 5, 2024
a31ae20
fix:search load (#1462)
yang1666204 Dec 5, 2024
e15d32b
fix: fix resources url error (#1465)
KassieZ Dec 5, 2024
871a15b
[fix] fix the search cors error (#1466)
jeffreys-cat Dec 6, 2024
a3ed26c
fix: fix the searchbar version (#1467)
yang1666204 Dec 6, 2024
99e243a
fix:change searchbar version to 0.45 (#1468)
yang1666204 Dec 6, 2024
f9cd873
fix:fix search bar (#1469)
yang1666204 Dec 6, 2024
204f988
fix:request when switch version (#1470)
yang1666204 Dec 6, 2024
3949aa9
feat: remove pwa (#1471)
jeffreys-cat Dec 6, 2024
4ebf77d
K8s decoupled doc (#1415)
intelligentfu8 Dec 6, 2024
7c0524f
[improvement](cluster_id) provide a shell generating random cluster_i…
dataroaring Dec 6, 2024
5725a4b
Update what-is-apache-doris.md (#1472)
ssusieee Dec 6, 2024
3f08c80
[opt](encryption) update encryption function docs (#1463)
morrySnow Dec 6, 2024
4888d0c
[update](sql statement) Refactor sql statement of v3.0and v2.1 (#1449)
KassieZ Dec 6, 2024
7e50cc8
[fix] fix format and typo (#1474)
KassieZ Dec 6, 2024
320436c
fix:searchbar style (#1475)
yang1666204 Dec 9, 2024
53ff296
modify the request url of search (#1479)
yang1666204 Dec 9, 2024
cb24992
[fix] release 3.0.3 (#1480)
KassieZ Dec 9, 2024
88010f3
[update] Blog (#1481)
KassieZ Dec 9, 2024
96ddd58
add build check (#1483)
yang1666204 Dec 9, 2024
e2c4201
Update es.md (#1430)
jgq2008303393 Dec 9, 2024
3df0d0d
[auth]change default log level of ranger to warn (#1433)
zddr Dec 9, 2024
13936a4
[doc](update) rewrite update overview doc (#1393)
zhannngchen Dec 9, 2024
a02c784
[update] Update auth docs (#1485)
KassieZ Dec 9, 2024
5ddf2ef
[doc](delete) update the examples of batch delete to reference links …
zhannngchen Dec 10, 2024
fcc63fd
[doc] fix the wrong link in the tpch document (#1486)
tiger3q Dec 10, 2024
e2ae0ff
[doc]Support for IP types in Java UDF (#1444)
Mryange Dec 10, 2024
ee9f8ba
fix name SQL conflict error in point query docs (#1439)
xiaokang Dec 10, 2024
52e1d92
[opt](cte) indicate not support recursive cte (#1388)
morrySnow Dec 10, 2024
00d1873
update subquery document (#1422)
starocean999 Dec 10, 2024
6b30608
Fixed the issue where the search box request took too long (#1495)
yang1666204 Dec 10, 2024
a3bd665
[update] Add file_cache_statistics (#1496)
KassieZ Dec 11, 2024
2f79bbb
[Doc](benchmark) update 2.1 benchmark doc. (#1493)
feifeifeimoon Dec 11, 2024
d43cc31
[community] Add special thanks of community (#1502)
KassieZ Dec 11, 2024
a34d7b6
Update mysql-compatibility.md (#1498)
wm1581066 Dec 11, 2024
5f78d78
[update] Fix typo of MySQL Compatibility (#1504)
KassieZ Dec 11, 2024
fe75658
[Enhancement]:Optimize navbar (#1497)
yang1666204 Dec 11, 2024
107bc40
Remove common docs (#1501)
yang1666204 Dec 11, 2024
2983496
[opt](sql cache) Reorganize the chapter on SQL Cache (#1392)
morrySnow Dec 11, 2024
bd18304
[fix] Fix deadlink of complex data type (#1507)
KassieZ Dec 12, 2024
9b69d94
[fix] add the window-function doc and multi-dimensional-analytics doc…
feiniaofeiafei Dec 12, 2024
2fbf40f
fix:download page (#1514)
yang1666204 Dec 12, 2024
fcb4a82
[feat]:searchbar in mobile (#1517)
yang1666204 Dec 13, 2024
fa45f8c
[doc](update) update concurrent control (#1487)
zhannngchen Dec 16, 2024
43a077a
[doc](update) modify update-of-unique-model.md (#1438)
zhannngchen Dec 16, 2024
92aa6dc
Restore announcementBar (#1525)
yang1666204 Dec 16, 2024
66849d8
[refactor](k8s)K8s coupled refactor arch (#1500)
intelligentfu8 Dec 16, 2024
b2bb98c
[docs](load) restructure broker load docs (#1372)
kaijchen Dec 17, 2024
3a930f8
update workload document
wangbo Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[doc](dbt)add dbt doc example (#1387)
# Versions 

- [ ] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0

# Languages

- [x] Chinese
- [x] English
  • Loading branch information
catpineapple authored Nov 25, 2024
commit 99a2a9e1eea5814dd5e64037555fcada8ff75c38
256 changes: 239 additions & 17 deletions common_docs_zh/ecosystem/dbt-doris-adapter.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
{
"title": "DBT Doris Adapter",
"language": "zh-CN"
"title": "DBT Doris Adapter",
"language": "zh-CN"
}
---

Expand Down Expand Up @@ -60,16 +60,16 @@ dbt init
```
会出现询问式命令行,输入相应配置如下即可初始化一个 dbt 项目:

| 名称 | 默认值 | 含义 |
|----------|------|------------------------------------------------------|
| project | | 项目名 |
| database | | 输入对应编号选择适配器(选择 doris) |
| host | | doris 的 host |
| port | 9030 | doris 的 MySQL Protocol Port |
| schema | | 在 dbt-doris 中,等同于 database,库名 |
| username | | doris 的 username |
| password | | doris 的 password |
| threads | 1 | dbt-doris 中并行度(设置与集群能力不匹配的并行度会增加 dbt 运行失败风险) |
| 名称 | 默认值 | 含义 |
|----------|------|----------------------------------------------|
| project | | 项目名 |
| database | | 输入对应编号选择适配器 |
| host | | doris 的 host |
| port | 9030 | doris 的 MySQL Protocol Port |
| schema | | 在 dbt-doris 中,等同于 database,库名 |
| username | | doris 的 username |
| password | | doris 的 password |
| threads | 1 | dbt-doris 中并行度(设置与集群能力不匹配的并行度会增加 dbt 运行失败风险) |


### dbt-doris adapter 运行
Expand All @@ -86,15 +86,15 @@ dbt run
可以登陆 doris,查看 my_first_dbt_model 和 my_second_dbt_model 的数据结果及建表语句。

### dbt-doris adapter 物化方式
dbt-doris 的 物化方式(Materialization)支持一下三种
dbt-doris 的 物化方式(Materialization)支持以下三种

1. view

2. table

3. incremental

**View**
**View**

使用`view`作为物化模式,在 Models 每次运行时都会通过 create view as 语句重新构建为视图。(默认情况下,dbt 的物化方式为 view)
```
Expand Down Expand Up @@ -249,9 +249,9 @@ models:

[`seed`](https://docs.getdbt.com/faqs/seeds/build-one-seed) 是用于加载 csv 等数据文件时的功能模块,它是一种加载文件入库参与模型构建的一种方式,但有以下注意事项:

1. seed 不应用于加载原始数据(例如,从生产数据库导出大型 CSV 文件)。
1. seed 不应用于加载原始数据(例如,从生产数据库导出大型 CSV 文件)。

2. 由于 seed 是受版本控制的,因此它们最适合包含特定于业务的逻辑的文件,例如国家/地区代码列表或员工的用户 ID。
2. 由于 seed 是受版本控制的,因此它们最适合包含特定于业务的逻辑的文件,例如国家/地区代码列表或员工的用户 ID。

3. 对于大文件,使用 dbt 的 seed 功能加载 CSV 的性能不佳。应该考虑使用 streamload 等方式将这些 CSV 加载到 doris 中。

Expand All @@ -276,4 +276,226 @@ seeds:
ip: varchar(15)
name: varchar(20)
cost: DecimalV3(19,10)
```
```
## 使用示例

### 视图模型样例参考

```sql
{{ config(materialized='view') }}

select
u.user_id,
max(o.create_time) as create_time,
sum (o.cost) as balance
from {{ ref('sell_order') }} as o
left join {{ ref('sell_user') }} as u
on u.account_id=o.account_id
group by u.user_id
order by u.user_id
```

### 表模型样例参考

```sql
{{ config(materialized='table') }}

select
u.user_id,
max(o.create_time) as create_time,
sum (o.cost) as balance
from {{ ref('sell_order') }} as o
left join {{ ref('sell_user') }} as u
on u.account_id=o.account_id
group by u.user_id
order by u.user_id
```

### 增量模型样例参考(duplicate 模式)

建表为 duplicate 模式,无数据聚合,不需要指定 unique_key

```sql
{{ config(
materialized='incremental',
replication_num=1
) }}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select * from source_data
```

### 增量模型样例参考(unique 模式)

建表为 unique 模式,数据聚合,必须指定 unique_key

```sql
{{ config(
materialized='incremental',
unique_key=['account_id','create_time']
) }}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select * from source_data
```

### 增量模型全量刷新样例参考

```sql
{{ config(
materialized='incremental',
full_refresh = true
)}}

select * from
{{ source('dbt_source', 'sell_user') }}
```

### 设置分桶规则样例参考

此处 buckets 可以填 auto 或者正整数,分别代表自动分桶和设置固定分桶数

```sql
{{ config(
materialized='incremental',
unique_key=['account_id',"create_time"],
distributed_by=['account_id'],
buckets='auto'
) }}

with source_data as (
select
*
from {{ ref('sell_order') }}
)

select
*
from source_data

{% if is_incremental() %}
where
create_time > (select max(create_time) from {{this}})
{% endif %}
```

### 设置副本数样例参考

```sql
{{ config(
materialized='table',
replication_num=1
)}}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select * from source_data
```

### 动态分区样例参考

```sql
{{ config(
materialized='incremental',
partition_by = 'create_time',
partition_type = 'range',
-- 这里的 properties 是 create table 语句中的 properties,这里面写了动态分区的相关配置
properties = {
"dynamic_partition.time_unit":"DAY",
"dynamic_partition.end":"8",
"dynamic_partition.prefix":"p",
"dynamic_partition.buckets":"4",
"dynamic_partition.create_history_partition":"true",
"dynamic_partition.history_partition_num":"3"
}
) }}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select
*
from source_data

{% if is_incremental() %}
where
create_time = DATE_SUB(CURDATE(), INTERVAL 1 DAY)
{% endif %}
```

### 常规分区样例参考

```sql
{{ config(
materialized='incremental',
partition_by = 'create_time',
partition_type = 'range',
-- 这里的 partition_by_init 是指的 创建分区表的历史分区,当前 doris 版本的历史分区需要手动指定
partition_by_init = [
"PARTITION `p20240601` VALUES [(\"2024-06-01\"), (\"2024-06-02\"))",
"PARTITION `p20240602` VALUES [(\"2024-06-02\"), (\"2024-06-03\"))"
]
)}}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select
*
from source_data

{% if is_incremental() %}
where
-- 如果提供了my_date变量,则使用该通路(通过 dbt run --vars '{"my_date": "\"2024-06-03\""}' 命令) 如果没有提供 my_date 变量(直接 dbt run ),则使用当前日期的前一天 , 这里的增量选择建议直接使用 doris 的 CURDATE() 函数,这个通路也是生产环境经常走的。
create_time = {{ var('my_date' , 'DATE_SUB(CURDATE(), INTERVAL 1 DAY)') }}

{% endif %}
```

### 批处理日期设置参数样例参考

```sql
{{ config(
materialized='incremental',
partition_by = 'create_time',
partition_type = 'range',
...
)}}

with source_data as (
select
*
from {{ ref('sell_order2') }}
)

select
*
from source_data

{% if is_incremental() %}
where
-- 如果提供了my_date变量,则使用该通路(通过 dbt run --vars '{"my_date": "\"2024-06-03\""}' 命令) 如果没有提供 my_date 变量(直接 dbt run ),则使用当前日期的前一天 , 这里的增量选择建议直接使用 doris 的 CURDATE() 函数,这个通路也是生产环境经常走的。
create_time = {{ var('my_date' , 'DATE_SUB(CURDATE(), INTERVAL 1 DAY)') }}

{% endif %}
```
Loading