Skip to content

Commit

Permalink
[Doc][Improve]add transform v2 doc & remove transform v1 doc (apache#…
Browse files Browse the repository at this point in the history
  • Loading branch information
EricJoy2048 authored Dec 30, 2022
1 parent b19cebb commit 07da889
Show file tree
Hide file tree
Showing 33 changed files with 492 additions and 903 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ The default engine use by SeaTunnel is [SeaTunnel Engine](seatunnel-engine/READM

- Sink Connectors supported [check out](https://seatunnel.apache.org/docs/category/sink-v2)

- Transform supported [check out](https://seatunnel.apache.org/docs/transform/common-options/)
- Transform supported [check out](docs/en/transform-v2)

### Here's a list of our connectors with their health status.[connector status](docs/en/Connector-v2-release-state.md)

Expand Down
4 changes: 2 additions & 2 deletions docs/en/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ SeaTunnel focuses on data integration and data synchronization, and is mainly de
## Features of SeaTunnel

- Rich and extensible Connector: SeaTunnel provides a Connector API that does not depend on a specific execution engine. Connectors (Source, Transform, Sink) developed based on this API can run On many different engines, such as SeaTunnel Engine, Flink, Spark that are currently supported.
- Connector plug-in: The plug-in design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel has supported more than 70 Connectors, and the number is surging. There is the list of the [currently-supported connectors](Connector-v2-release-state.md)
- Connector plug-in: The plug-in design allows users to easily develop their own Connector and integrate it into the SeaTunnel project. Currently, SeaTunnel has supported more than 100 Connectors, and the number is surging. There is the list of the [currently-supported connectors](Connector-v2-release-state.md)
- Batch-stream integration: Connectors developed based on SeaTunnel Connector API are perfectly compatible with offline synchronization, real-time synchronization, full- synchronization, incremental synchronization and other scenarios. It greatly reduces the difficulty of managing data integration tasks.
- Support distributed snapshot algorithm to ensure data consistency.
- Multi-engine support: SeaTunnel uses SeaTunnel Engine for data synchronization by default. At the same time, SeaTunnel also supports the use of Flink or Spark as the execution engine of the Connector to adapt to the existing technical components of the enterprise. SeaTunnel supports multiple versions of Spark and Flink.
Expand All @@ -51,7 +51,7 @@ The default engine use by SeaTunnel is [SeaTunnel Engine](seatunnel-engine/about

- **Source Connectors** SeaTunnel support read data from various relational databases, graph databases, NoSQL databases, document databases, and memory databases. Various distributed file systems such as HDFS. A variety of cloud storage, such as S3 and OSS. At the same time, we also support data reading of many common SaaS services. You can access the detailed list [here](connector-v2/source). If you want, You can develop your own source connector and easily integrate it into seatunnel.

- **Transform Connector**
- **Transform Connector** If the schema is different between source and sink, You can use Transform Connector to change the schema read from source and make it same as the sink schema.

- **Sink Connector** SeaTunnel support write data to various relational databases, graph databases, NoSQL databases, document databases, and memory databases. Various distributed file systems such as HDFS. A variety of cloud storage, such as S3 and OSS. At the same time, we also support write data to many common SaaS services. You can access the detailed list [here](connector-v2/sink). If you want, You can develop your own sink connector and easily integrate it into seatunnel.

Expand Down
52 changes: 44 additions & 8 deletions docs/en/concept/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,28 @@ The Config file will be similar to the one below.

```hocon
env {
execution.parallelism = 1
job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
field_name = "name,age"
row.num = 100
schema = {
fields {
name = "string"
age = "int"
card = "int"
}
}
}
}
transform {
sql {
sql = "select name,age from fake"
Filter {
source_table_name = "fake"
result_table_name = "fake1"
fields = [name, card]
}
}
Expand All @@ -41,9 +50,10 @@ sink {
host = "clickhouse:8123"
database = "default"
table = "seatunnel_console"
fields = ["name"]
fields = ["name", "card"]
username = "default"
password = ""
source_table_name = "fake1"
}
}
```
Expand Down Expand Up @@ -74,13 +84,39 @@ course, this uses the word 'may', which means that we can also directly treat th
directly from source to sink. Like below.

```hocon
transform {
// no thing on here
env {
job.mode = "BATCH"
}
source {
FakeSource {
result_table_name = "fake"
row.num = 100
schema = {
fields {
name = "string"
age = "int"
card = "int"
}
}
}
}
sink {
Clickhouse {
host = "clickhouse:8123"
database = "default"
table = "seatunnel_console"
fields = ["name", "age", card"]
username = "default"
password = ""
source_table_name = "fake1"
}
}
```

Like source, transform has specific parameters that belong to each module. The supported source at now check.
The supported transform at now check [Transform of SeaTunnel](../transform)
The supported transform at now check [Transform V2 of SeaTunnel](../transform-v2)

### sink

Expand Down
6 changes: 0 additions & 6 deletions docs/en/connector-v2/sink/Console.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,6 @@ source {
}
}
transform {
sql {
sql = "select name, age from fake"
}
}
sink {
Console {
Expand Down
3 changes: 0 additions & 3 deletions docs/en/connector-v2/sink/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,6 @@ source {
}
}
transform {
}
sink {
# choose stdout output plugin to output data to console
Expand Down
4 changes: 0 additions & 4 deletions docs/en/connector-v2/sink/Socket.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,6 @@ source {
}
}
transform {
sql = "select name, age from fake"
}
sink {
Socket {
host = "localhost"
Expand Down
19 changes: 11 additions & 8 deletions docs/en/connector-v2/sink/common-options.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Common Options
# Sink Common Options

> Common parameters of sink connectors
Expand Down Expand Up @@ -32,24 +32,27 @@ source {
}

transform {
sql {
Filter {
source_table_name = "fake"
sql = "select name from fake"
fields = [name]
result_table_name = "fake_name"
}
sql {
Filter {
source_table_name = "fake"
sql = "select age from fake"
fields = [age]
result_table_name = "fake_age"
}
}

sink {
console {
parallelism = 3
Console {
source_table_name = "fake_name"
}
Console {
source_table_name = "fake_age"
}
}
```

> If `source_table_name` is not specified, the console outputs the data of the last transform, and if it is set to `fake_name` , it will output the data of `fake_name`
> If the job only have one source and one(or zero) transform and one sink, You do not need to specify `source_table_name` and `result_table_name` for connector.
> If the number of any operator in source, transform and sink is greater than 1, you must specify the `source_table_name` and `result_table_name` for each connector in the job.
3 changes: 0 additions & 3 deletions docs/en/connector-v2/source/Socket.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,6 @@ source {
}
}
transform {
}
sink {
Console {}
}
Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/source/common-options.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Common Options
# Source Common Options

> Common parameters of source connectors
Expand Down
2 changes: 1 addition & 1 deletion docs/en/seatunnel-engine/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ mkdir -p $SEATUNNEL_HOME/logs
nohup seatunnel-cluster.sh &
```

The logs will write in `$SEATUNNEL_HOME/logs/seatunnel-server.log`
The logs will write in `$SEATUNNEL_HOME/logs/seatunnel-engine-server.log`

## 8. Install SeaTunnel Engine Client

Expand Down
4 changes: 0 additions & 4 deletions docs/en/start-v2/locally/quick-start-flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,6 @@ source {
}
}
transform {
}
sink {
Console {}
}
Expand Down
6 changes: 1 addition & 5 deletions docs/en/start-v2/locally/quick-start-seatunnel-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,6 @@ source {
}
}
transform {
}
sink {
Console {}
}
Expand Down Expand Up @@ -82,7 +78,7 @@ row=16 : SGZCr, 94186144

## What's More

For now, you are already take a quick look about SeaTunnel, you could see [connector](/docs/category/connector-v2) to find all
For now, you are already take a quick look about SeaTunnel, you could see [connector](../../connector-v2/source/FakeSource.md) to find all
source and sink SeaTunnel supported. Or see [SeaTunnel Engine](../../seatunnel-engine/about.md) if you want to know more about SeaTunnel Engine.

SeaTunnel also supports running jobs in Spark/Flink. You can see [Quick Start With Spark](quick-start-spark.md) or [Quick Start With Flink](quick-start-flink.md).
4 changes: 0 additions & 4 deletions docs/en/start-v2/locally/quick-start-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,6 @@ source {
}
}
transform {
}
sink {
Console {}
}
Expand Down
23 changes: 23 additions & 0 deletions docs/en/transform-v2/common-options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Transform Common Options

> Common parameters of source connectors
| name | type | required | default value |
|-------------------| ------ | -------- | ------------- |
| result_table_name | string | no | - |
| source_table_name | string | no | - |

### source_table_name [string]

When `source_table_name` is not specified, the current plug-in processes the data set `(dataset)` output by the previous plug-in in the configuration file;

When `source_table_name` is specified, the current plugin is processing the data set corresponding to this parameter.

### result_table_name [string]

When `result_table_name` is not specified, the data processed by this plugin will not be registered as a data set that can be directly accessed by other plugins, or called a temporary table `(table)`;

When `result_table_name` is specified, the data processed by this plugin will be registered as a data set `(dataset)` that can be directly accessed by other plugins, or called a temporary table `(table)` . The dataset registered here can be directly accessed by other plugins by specifying `source_table_name` .

## Examples

66 changes: 66 additions & 0 deletions docs/en/transform-v2/copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Copy

> Copy transform plugin
## Description

Copy a field to a new field.

## Options

| name | type | required | default value |
|---------------| ------ | -------- |---------------|
| src_field | string | yes | |
| dest_field | string | yes | |

### src_field [string]

Src field name you want to copy

### dest_field [string]

This dest field name

### common options [string]

Transform plugin common parameters, please refer to [Transform Plugin](common-options.md) for details

## Example

The data read from source is a table like this:

| name | age | card |
|----------|-----|------|
| Joy Ding | 20 | 123 |
| May Ding | 20 | 123 |
| Kin Dom | 20 | 123 |
| Joy Dom | 20 | 123 |

We want copy field `name` to a new field `name1`, we can add `Copy` Transform like this

```
transform {
Copy {
source_table_name = "fake"
result_table_name = "fake1"
src_field = "name"
dest_field = "name1"
}
}
```

Then the data in result table `fake1` will like this

| name | age | card | name1 |
|----------|-----|------|----------|
| Joy Ding | 20 | 123 | Joy Ding |
| May Ding | 20 | 123 | May Ding |
| Kin Dom | 20 | 123 | Kin Dom |
| Joy Dom | 20 | 123 | Joy Dom |


## Changelog

### new version

- Add Copy Transform Connector
Loading

0 comments on commit 07da889

Please sign in to comment.