Skip to content

Commit f776ee5

Browse files
committed
Add elasticsearch lookup join document
1 parent 54eeaad commit f776ee5

File tree

2 files changed

+125
-0
lines changed

2 files changed

+125
-0
lines changed

docs/content.zh/docs/connectors/table/elasticsearch.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,51 @@ CREATE TABLE myUserTable (
222222
默认使用内置的 <code>'json'</code> 格式。更多详细信息,请参阅 <a href="{{< ref "docs/connectors/table/formats/overview" >}}">JSON Format</a> 页面。
223223
</td>
224224
</tr>
225+
<tr>
226+
<td><h5>lookup.cache</h5></td>
227+
<td>可选</td>
228+
<td style="word-wrap: break-word;">NONE</td>
229+
<td><p>枚举类型</p>可选值: NONE, PARTIAL</td>
230+
<td>维表的缓存策略。 目前支持 NONE(不缓存)和 PARTIAL(只在外部数据库中查找数据时缓存)。</td>
231+
</tr>
232+
<tr>
233+
<td><h5>lookup.partial-cache.max-rows</h5></td>
234+
<td>可选</td>
235+
<td style="word-wrap: break-word;">(none)</td>
236+
<td>Long</td>
237+
<td>查找缓存的最大行数,超过这个值,最旧的行将过期。使用该配置时 "lookup.cache" 必须设置为 "PARTIAL”。</td>
238+
</tr>
239+
<tr>
240+
<td><h5>lookup.partial-cache.expire-after-write</h5></td>
241+
<td>可选</td>
242+
<td style="word-wrap: break-word;">(none)</td>
243+
<td>Duration</td>
244+
<td>在记录写入缓存后该记录的最大保留时间。
245+
使用该配置时 "lookup.cache" 必须设置为 "PARTIAL”。</td>
246+
</tr>
247+
<tr>
248+
<td><h5>lookup.partial-cache.expire-after-access</h5></td>
249+
<td>可选</td>
250+
<td style="word-wrap: break-word;">(none)</td>
251+
<td>Duration</td>
252+
<td>在缓存中的记录被访问后该记录的最大保留时间。
253+
使用该配置时 "lookup.cache" 必须设置为 "PARTIAL”。</td>
254+
</tr>
255+
<tr>
256+
<td><h5>lookup.partial-cache.caching-missing-key</h5></td>
257+
<td>可选</td>
258+
<td style="word-wrap: break-word;">true</td>
259+
<td>Boolean</td>
260+
<td>是否缓存维表中不存在的键,默认为true。
261+
使用该配置时 "lookup.cache" 必须设置为 "PARTIAL”。</td>
262+
</tr>
263+
<tr>
264+
<td><h5>lookup.max-retries</h5></td>
265+
<td>可选</td>
266+
<td style="word-wrap: break-word;">3</td>
267+
<td>Integer</td>
268+
<td>查找数据库失败时的最大重试次数。</td>
269+
</tr>
225270
</tbody>
226271
</table>
227272

@@ -257,6 +302,21 @@ Elasticsearch sink 同时支持静态索引和动态索引。
257302

258303
**注意:** 使用当前系统时间生成的动态索引时, 对于 changelog 的流,无法保证同一主键对应的记录能产生相同的索引名, 因此使用基于系统时间的动态索引,只能支持 append only 的流。
259304

305+
### Lookup Cache
306+
307+
Elasticsearch 连接器可以用在时态表关联中作为一个可 lookup 的 source (又称为维表),当前只支持同步的查找模式。
308+
309+
默认情况下,lookup cache 是未启用的,你可以将 `lookup.cache` 设置为 `PARTIAL` 参数来启用。
310+
311+
lookup cache 的主要目的是用于提高时态表关联 Elasticsearch 连接器的性能。默认情况下,lookup cache 不开启,所以所有请求都会发送到外部数据库。
312+
当 lookup cache 被启用时,每个进程(即 TaskManager)将维护一个缓存。Flink 将优先查找缓存,只有当缓存未查找到时才向外部数据库发送请求,并使用返回的数据更新缓存。
313+
当缓存命中最大缓存行 `lookup.partial-cache.max-rows` 或当行超过 `lookup.partial-cache.expire-after-write``lookup.partial-cache.expire-after-access` 指定的最大存活时间时,缓存中的行将被设置为已过期。
314+
缓存中的记录可能不是最新的,用户可以将缓存记录超时设置为一个更小的值以获得更好的刷新数据,但这可能会增加发送到数据库的请求数。所以要做好吞吐量和正确性之间的平衡。
315+
316+
默认情况下,flink 会缓存主键的空查询结果,你可以通过将 `lookup.partial-cache.cache-missing-key` 设置为 false 来切换行为。
317+
318+
<a name="idempotent-writes"></a>
319+
260320
数据类型映射
261321
----------------
262322

docs/content/docs/connectors/table/elasticsearch.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,58 @@ Connector Options
244244
By default uses built-in <code>'json'</code> format. Please refer to <a href="{{< ref "docs/connectors/table/formats/overview" >}}">JSON Format</a> page for more details.
245245
</td>
246246
</tr>
247+
<tr>
248+
<td><h5>lookup.cache</h5></td>
249+
<td>optional</td>
250+
<td>yes</td>
251+
<td style="word-wrap: break-word;">NONE</td>
252+
<td><p>Enum</p>Possible values: NONE, PARTIAL</td>
253+
<td>The cache strategy for the lookup table. Currently supports NONE (no caching) and PARTIAL (caching entries on lookup operation in external database).</td>
254+
</tr>
255+
<tr>
256+
<td><h5>lookup.partial-cache.max-rows</h5></td>
257+
<td>optional</td>
258+
<td>yes</td>
259+
<td style="word-wrap: break-word;">(none)</td>
260+
<td>Long</td>
261+
<td>The max number of rows of lookup cache, over this value, the oldest rows will be expired.
262+
"lookup.cache" must be set to "PARTIAL" to use this option.</td>
263+
</tr>
264+
<tr>
265+
<td><h5>lookup.partial-cache.expire-after-write</h5></td>
266+
<td>optional</td>
267+
<td>yes</td>
268+
<td style="word-wrap: break-word;">(none)</td>
269+
<td>Duration</td>
270+
<td>The max time to live for each rows in lookup cache after writing into the cache
271+
"lookup.cache" must be set to "PARTIAL" to use this option. </td>
272+
</tr>
273+
<tr>
274+
<td><h5>lookup.partial-cache.expire-after-access</h5></td>
275+
<td>optional</td>
276+
<td>yes</td>
277+
<td style="word-wrap: break-word;">(none)</td>
278+
<td>Duration</td>
279+
<td>The max time to live for each rows in lookup cache after accessing the entry in the cache.
280+
"lookup.cache" must be set to "PARTIAL" to use this option. </td>
281+
</tr>
282+
<tr>
283+
<td><h5>lookup.partial-cache.caching-missing-key</h5></td>
284+
<td>optional</td>
285+
<td>yes</td>
286+
<td style="word-wrap: break-word;">true</td>
287+
<td>Boolean</td>
288+
<td>Whether to store an empty value into the cache if the lookup key doesn't match any rows in the table.
289+
"lookup.cache" must be set to "PARTIAL" to use this option.</td>
290+
</tr>
291+
<tr>
292+
<td><h5>lookup.max-retries</h5></td>
293+
<td>optional</td>
294+
<td>yes</td>
295+
<td style="word-wrap: break-word;">3</td>
296+
<td>Integer</td>
297+
<td>The max retry times if lookup database failed.</td>
298+
</tr>
247299
</tbody>
248300
</table>
249301

@@ -280,6 +332,19 @@ When formatting the system time as a string, the time zone configured in the ses
280332
**NOTE:** When using the dynamic index generated by the current system time, for changelog stream, there is no guarantee that the records with the same primary key can generate the same index name.
281333
Therefore, the dynamic index based on the system time can only support append only stream.
282334

335+
### Lookup Cache
336+
337+
Elasticsearch connector can be used in temporal join as a lookup source (aka. dimension table). Currently, only sync lookup mode is supported.
338+
339+
By default, lookup cache is not enabled. You can enable it by setting `lookup.cache` to `PARTIAL`.
340+
341+
The lookup cache is used to improve performance of temporal join the Elasticsearch connector. By default, lookup cache is not enabled, so all the requests are sent to external database.
342+
When lookup cache is enabled, each process (i.e. TaskManager) will hold a cache. Flink will lookup the cache first, and only send requests to external database when cache missing, and update cache with the rows returned.
343+
The oldest rows in cache will be expired when the cache hit to the max cached rows `lookup.partial-cache.max-rows` or when the row exceeds the max time to live specified by `lookup.partial-cache.expire-after-write` or `lookup.partial-cache.expire-after-access`.
344+
The cached rows might not be the latest, users can tune expiration options to a smaller value to have a better fresh data, but this may increase the number of requests send to database. So this is a balance between throughput and correctness.
345+
346+
By default, flink will cache the empty query result for a Primary key, you can toggle the behaviour by setting `lookup.partial-cache.cache-missing-key` to false.
347+
283348
Data Type Mapping
284349
----------------
285350

0 commit comments

Comments
 (0)