Skip to content

Commit 337b5de

Browse files
authored
Merge branch 'master' into feature/added-whitespace-in-json-to-string-formatter
2 parents 1265453 + 8e6c361 commit 337b5de

File tree

88 files changed

+4027
-598
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+4027
-598
lines changed

.github/workflows/flink_cdc_migration_test.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,7 @@ jobs:
3939
runs-on: ubuntu-latest
4040
strategy:
4141
matrix:
42-
# '1.20.0' is excluded since FLINK-36105 has not been merged.
43-
flink-version: [ '1.18.1', '1.19.1' ]
42+
flink-version: [ '1.18.1', '1.19.1', '1.20.0' ]
4443

4544
steps:
4645
- uses: actions/checkout@v4

docs/content.zh/docs/connectors/flink-sources/mysql-cdc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -581,7 +581,7 @@ MySQL CDC Source 使用主键列将表划分为多个分片(chunk)。 默认
581581
[100, +∞)
582582
```
583583

584-
对于其他主键列类型, MySQL CDC Source 将以下形式执行语句: `SELECT MAX(STR_ID) AS chunk_high FROM (SELECT * FROM TestTable WHERE STR_ID > 'uuid-001' limit 25)` 来获得每个区块的低值和高值,
584+
对于其他主键列类型, MySQL CDC Source 将以下形式执行语句: `SELECT MAX(STR_ID) AS chunk_high FROM (SELECT * FROM TestTable WHERE STR_ID > 'uuid-001' ORDER BY STR_ID ASC LIMIT 25)` 来获得每个区块的低值和高值,
585585
分割块集如下所示:
586586

587587
```

docs/content.zh/docs/connectors/flink-sources/vitess-cdc.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ more released versions will be available in the Maven central warehouse.
4949
Setup Vitess server
5050
----------------
5151

52-
You can follow the Local Install via [Docker guide](https://vitess.io/docs/get-started/local-docker/), or the Vitess Operator for [Kubernetes guide](https://vitess.io/docs/get-started/operator/) to install Vitess. No special setup is needed to support Vitess connector.
52+
You can follow the Local Install via [Docker guide](https://vitess.io/docs/get-started/vttestserver-docker-image/), or the Vitess Operator for [Kubernetes guide](https://vitess.io/docs/get-started/operator/) to install Vitess. No special setup is needed to support Vitess connector.
5353

5454
### Checklist
5555
* Make sure that the VTGate host and its gRPC port (default is 15991) is accessible from the machine where the Vitess connector is installed
Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
---
2+
title: "Elasticsearch"
3+
weight: 7
4+
type: docs
5+
aliases:
6+
- /connectors/pipeline-connectors/elasticsearch
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# Elasticsearch Pipeline Connector
28+
29+
Elasticsearch Pipeline 连接器可以用作 Pipeline 的 Data Sink, 将数据写入 Elasticsearch。 本文档介绍如何设置 Elasticsearch Pipeline 连接器。
30+
31+
32+
How to create Pipeline
33+
----------------
34+
35+
从 MySQL 读取数据同步到 Elasticsearch 的 Pipeline 可以定义如下:
36+
37+
```yaml
38+
source:
39+
type: mysql
40+
name: MySQL Source
41+
hostname: 127.0.0.1
42+
port: 3306
43+
username: admin
44+
password: pass
45+
tables: adb.\.*, bdb.user_table_[0-9]+, [app|web].order_\.*
46+
server-id: 5401-5404
47+
48+
sink:
49+
type: elasticsearch
50+
name: Elasticsearch Sink
51+
hosts: http://127.0.0.1:9092,http://127.0.0.1:9093
52+
53+
route:
54+
- source-table: adb.\.*
55+
sink-table: default_index
56+
description: sync adb.\.* table to default_index
57+
58+
pipeline:
59+
name: MySQL to Elasticsearch Pipeline
60+
parallelism: 2
61+
```
62+
63+
Pipeline Connector Options
64+
----------------
65+
<div class="highlight">
66+
<table class="colwidths-auto docutils">
67+
<thead>
68+
<tr>
69+
<th class="text-left" style="width: 25%">Option</th>
70+
<th class="text-left" style="width: 8%">Required</th>
71+
<th class="text-left" style="width: 7%">Default</th>
72+
<th class="text-left" style="width: 10%">Type</th>
73+
<th class="text-left" style="width: 50%">Description</th>
74+
</tr>
75+
</thead>
76+
<tbody>
77+
<tr>
78+
<td>type</td>
79+
<td>required</td>
80+
<td style="word-wrap: break-word;">(none)</td>
81+
<td>String</td>
82+
<td>指定要使用的连接器, 这里需要设置成 <code>'elasticsearch'</code>.</td>
83+
</tr>
84+
<tr>
85+
<td>name</td>
86+
<td>optional</td>
87+
<td style="word-wrap: break-word;">(none)</td>
88+
<td>String</td>
89+
<td>Sink 的名称。</td>
90+
</tr>
91+
<tr>
92+
<td>hosts</td>
93+
<td>required</td>
94+
<td style="word-wrap: break-word;">(none)</td>
95+
<td>String</td>
96+
<td>要连接到的一台或多台 Elasticsearch 主机,例如: 'http://host_name:9092,http://host_name:9093'.</td>
97+
</tr>
98+
<tr>
99+
<td>version</td>
100+
<td>optional</td>
101+
<td style="word-wrap: break-word;">7</td>
102+
<td>Integer</td>
103+
<td>指定要使用的连接器,有效值为:
104+
<ul>
105+
<li>6: 连接到 Elasticsearch 6.x 的集群。</li>
106+
<li>7: 连接到 Elasticsearch 7.x 的集群。</li>
107+
<li>8: 连接到 Elasticsearch 8.x 的集群。</li>
108+
</ul>
109+
</td>
110+
</tr>
111+
<tr>
112+
<td>username</td>
113+
<td>optional</td>
114+
<td style="word-wrap: break-word;">(none)</td>
115+
<td>String</td>
116+
<td>用于连接 Elasticsearch 实例认证的用户名。</td>
117+
</tr>
118+
<tr>
119+
<td>password</td>
120+
<td>optional</td>
121+
<td style="word-wrap: break-word;">(none)</td>
122+
<td>String</td>
123+
<td>用于连接 Elasticsearch 实例认证的密码。</td>
124+
</tr>
125+
<tr>
126+
<td>batch.size.max</td>
127+
<td>optional</td>
128+
<td style="word-wrap: break-word;">500</td>
129+
<td>Integer</td>
130+
<td>每个批量请求的最大缓冲操作数。 可以设置为'0'来禁用它。</td>
131+
</tr>
132+
<tr>
133+
<td>inflight.requests.max</td>
134+
<td>optional</td>
135+
<td style="word-wrap: break-word;">5</td>
136+
<td>Integer</td>
137+
<td>连接器将尝试执行的最大并发请求数。</td>
138+
</tr>
139+
<tr>
140+
<td>buffered.requests.max</td>
141+
<td>optional</td>
142+
<td style="word-wrap: break-word;">1000</td>
143+
<td>Integer</td>
144+
<td>每个批量请求的内存缓冲区中保留的最大请求数。</td>
145+
</tr>
146+
<tr>
147+
<td>batch.size.max.bytes</td>
148+
<td>optional</td>
149+
<td style="word-wrap: break-word;">5242880</td>
150+
<td>Long</td>
151+
<td>每个批量请求的缓冲操作在内存中的最大值(以byte为单位)。</td>
152+
</tr>
153+
<tr>
154+
<td>buffer.time.max.ms</td>
155+
<td>optional</td>
156+
<td style="word-wrap: break-word;">5000</td>
157+
<td>Long</td>
158+
<td>每个批量请求的缓冲 flush 操作的间隔(以ms为单位)。</td>
159+
</tr>
160+
<tr>
161+
<td>record.size.max.bytes</td>
162+
<td>optional</td>
163+
<td style="word-wrap: break-word;">10485760</td>
164+
<td>Long</td>
165+
<td>单个记录的最大大小(以byte为单位)。</td>
166+
</tr>
167+
</tbody>
168+
</table>
169+
</div>
170+
171+
Usage Notes
172+
--------
173+
174+
* 写入 Elasticsearch 的 index 默认为与上游表同名字符串,可以通过 pipeline 的 route 功能进行修改。
175+
176+
* 如果写入 Elasticsearch 的 index 不存在,不会被默认创建。
177+
178+
Data Type Mapping
179+
----------------
180+
Elasticsearch 将文档存储在 JSON 字符串中,数据类型之间的映射关系如下表所示:
181+
<div class="wy-table-responsive">
182+
<table class="colwidths-auto docutils">
183+
<thead>
184+
<tr>
185+
<th class="text-left">CDC type</th>
186+
<th class="text-left">JSON type</th>
187+
<th class="text-left" style="width:60%;">NOTE</th>
188+
</tr>
189+
</thead>
190+
<tbody>
191+
<tr>
192+
<td>TINYINT</td>
193+
<td>NUMBER</td>
194+
<td></td>
195+
</tr>
196+
<tr>
197+
<td>SMALLINT</td>
198+
<td>NUMBER</td>
199+
<td></td>
200+
</tr>
201+
<tr>
202+
<td>INT</td>
203+
<td>NUMBER</td>
204+
<td></td>
205+
</tr>
206+
<tr>
207+
<td>BIGINT</td>
208+
<td>NUMBER</td>
209+
<td></td>
210+
</tr>
211+
<tr>
212+
<td>FLOAT</td>
213+
<td>NUMBER</td>
214+
<td></td>
215+
</tr>
216+
<tr>
217+
<td>DOUBLE</td>
218+
<td>NUMBER</td>
219+
<td></td>
220+
</tr>
221+
<tr>
222+
<td>DECIMAL(p, s)</td>
223+
<td>STRING</td>
224+
<td></td>
225+
</tr>
226+
<tr>
227+
<td>BOOLEAN</td>
228+
<td>BOOLEAN</td>
229+
<td></td>
230+
</tr>
231+
<tr>
232+
<td>DATE</td>
233+
<td>STRING</td>
234+
<td>with format: date (yyyy-MM-dd), example: 2024-10-21</td>
235+
</tr>
236+
<tr>
237+
<td>TIMESTAMP</td>
238+
<td>STRING</td>
239+
<td>with format: date-time (yyyy-MM-dd HH:mm:ss.SSSSSS, with UTC time zone), example: 2024-10-21 14:10:56.000000</td>
240+
</tr>
241+
<tr>
242+
<td>TIMESTAMP_LTZ</td>
243+
<td>STRING</td>
244+
<td>with format: date-time (yyyy-MM-dd HH:mm:ss.SSSSSS, with UTC time zone), example: 2024-10-21 14:10:56.000000</td>
245+
</tr>
246+
<tr>
247+
<td>CHAR(n)</td>
248+
<td>STRING</td>
249+
<td></td>
250+
</tr>
251+
<tr>
252+
<td>VARCHAR(n)</td>
253+
<td>STRING</td>
254+
<td></td>
255+
</tr>
256+
<tr>
257+
<td>ARRAY</td>
258+
<td>ARRAY</td>
259+
<td></td>
260+
</tr>
261+
<tr>
262+
<td>MAP</td>
263+
<td>STRING</td>
264+
<td></td>
265+
</tr>
266+
<tr>
267+
<td>ROW</td>
268+
<td>STRING</td>
269+
<td></td>
270+
</tr>
271+
</tbody>
272+
</table>
273+
</div>
274+
275+
{{< top >}}

docs/content.zh/docs/connectors/pipeline-connectors/mysql.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,10 @@ source:
393393
DOUBLE UNSIGNED ZEROFILL<br>
394394
DOUBLE PRECISION<br>
395395
DOUBLE PRECISION UNSIGNED<br>
396-
DOUBLE PRECISION UNSIGNED ZEROFILL
396+
DOUBLE PRECISION UNSIGNED ZEROFILL<br>
397+
FLOAT(p, s)<br>
398+
REAL(p, s)<br>
399+
DOUBLE(p, s)
397400
</td>
398401
<td>DOUBLE</td>
399402
<td></td>

docs/content.zh/docs/core-concept/transform.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,8 @@ Flink CDC uses [Calcite](https://calcite.apache.org/) to parse expressions and [
126126
| LOWER(string) | lower(string) | Returns string in lowercase. |
127127
| TRIM(string1) | trim('BOTH',string1) | Returns a string that removes whitespaces at both sides. |
128128
| REGEXP_REPLACE(string1, string2, string3) | regexpReplace(string1, string2, string3) | Returns a string from STRING1 with all the substrings that match a regular expression STRING2 consecutively being replaced with STRING3. E.g., 'foobar'.regexpReplace('oo\|ar', '') returns "fb". |
129-
| SUBSTRING(string FROM integer1 [ FOR integer2 ]) | substring(string,integer1,integer2) | Returns a substring of STRING starting from position INT1 with length INT2 (to the end by default). |
129+
| SUBSTR(string, integer1[, integer2]) | substr(string,integer1,integer2) | Returns a substring of STRING starting from position integer1 with length integer2 (to the end by default). |
130+
| SUBSTRING(string FROM integer1 [ FOR integer2 ]) | substring(string,integer1,integer2) | Returns a substring of STRING starting from position integer1 with length integer2 (to the end by default). |
130131
| CONCAT(string1, string2,…) | concat(string1, string2,…) | Returns a string that concatenates string1, string2, …. E.g., CONCAT('AA', 'BB', 'CC') returns 'AABBCC'. |
131132

132133
## Temporal Functions
@@ -153,6 +154,23 @@ Flink CDC uses [Calcite](https://calcite.apache.org/) to parse expressions and [
153154
| COALESCE(value1 [, value2]*) | coalesce(Object... objects) | Returns the first argument that is not NULL.If all arguments are NULL, it returns NULL as well. The return type is the least restrictive, common type of all of its arguments. The return type is nullable if all arguments are nullable as well. |
154155
| IF(condition, true_value, false_value) | condition ? true_value : false_value | Returns the true_value if condition is met, otherwise false_value. E.g., IF(5 > 3, 5, 3) returns 5. |
155156

157+
## Casting Functions
158+
159+
You can use `CAST( <EXPR> AS <T> )` syntax to convert any valid expression `<EXPR>` to a specific type `<T>`. Possible conversion paths are:
160+
161+
| Source Type | Target Type | Notes |
162+
|-------------------------------------|-------------|--------------------------------------------------------------------------------------------|
163+
| ANY | STRING | All types can be cast to STRING. |
164+
| NUMERIC, STRING | BOOLEAN | Any non-zero numerics will be evaluated to `TRUE`. |
165+
| NUMERIC | BYTE | Value must be in the range of Byte (-128 ~ 127). |
166+
| NUMERIC | SHORT | Value must be in the range of Short (-32768 ~ 32767). |
167+
| NUMERIC | INTEGER | Value must be in the range of Integer (-2147483648 ~ 2147483647). |
168+
| NUMERIC | LONG | Value must be in the range of Long (-9223372036854775808 ~ 9223372036854775807). |
169+
| NUMERIC | FLOAT | Value must be in the range of Float (1.40239846e-45f ~ 3.40282347e+38f). |
170+
| NUMERIC | DOUBLE | Value must be in the range of Double (4.94065645841246544e-324 ~ 1.79769313486231570e+308) |
171+
| NUMERIC | DECIMAL | Value must be in the range of BigDecimal(10, 0). |
172+
| STRING, TIMESTAMP_TZ, TIMESTAMP_LTZ | TIMESTAMP | String type value must be a valid `ISO_LOCAL_DATE_TIME` string. |
173+
156174
# Example
157175
## Add computed columns
158176
Evaluation expressions can be used to generate new columns. For example, if we want to append two computed columns based on the table `web_order` in the database `mydb`, we may define a transform rule as follows:

0 commit comments

Comments
 (0)