Skip to content

Commit

Permalink
[Doc] Add document for datax and sample codes (apache#6389)
Browse files Browse the repository at this point in the history
Add documents for datax in extension catalog.
Add documents for sampes in best-practice catalog.
  • Loading branch information
morningman authored Aug 11, 2021
1 parent 10f410f commit 1a5b031
Show file tree
Hide file tree
Showing 9 changed files with 305 additions and 135 deletions.
122 changes: 1 addition & 121 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,127 +41,7 @@ The simplicity (of developing, deploying and using) and meeting many data servin

## 4. Compile and install

Currently only supports Docker environment and Linux OS, such as Ubuntu and CentOS.

### 4.1 Compile in Docker environment (Recommended)

We offer a docker image as a Doris compilation environment. You can compile Doris from source in it and run the output binaries in other Linux environments.

Firstly, you need to install and start docker service.

And then you could build Doris as following steps:

#### Step1: Pull the docker image with Doris building environment

```
$ docker pull apachedoris/doris-dev:build-env-1.3
```

You can check it by listing images, for example:

```
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
apachedoris/doris-dev build-env-1.3 c9665fbee395 5 days ago 3.55GB
```
> NOTE: You may have to use different images to compile from source.
>
> | image version | commit id | release version |
> |---|---|---|
> | apachedoris/doris-dev:build-env | before [ff0dd0d](https://github.com/apache/incubator-doris/commit/ff0dd0d2daa588f18b6db56f947e813a56d8ec81) | 0.8.x, 0.9.x |
> | apachedoris/doris-dev:build-env-1.1 | [ff0dd0d](https://github.com/apache/incubator-doris/commit/ff0dd0d2daa588f18b6db56f947e813a56d8ec81) or later | 0.10.x or 0.11.x |
> | apache/incubator-doris:build-env-1.2 | [4ef5a8c](https://github.com/apache/incubator-doris/commit/4ef5a8c8560351d7fff7ff8fd51c4c7a75e006a8) | 0.12.x - 0.14.0 |
> | apache/incubator-doris:build-env-1.3 | [ad67dd3](https://github.com/apache/incubator-doris/commit/ad67dd34a04c1ca960cff38e5b335b30fc7d559f) | later version |



#### Step2: Run the Docker image

You can run the image directly:

```
$ docker run -it apachedoris/doris-dev:build-env-1.3
```

Or if you want to compile the source located in your local host, you can map the local directory to the image by running:

```
$ docker run -it -v /your/local/path/incubator-doris-DORIS-x.x.x-release/:/root/incubator-doris-DORIS-x.x.x-release/ apachedoris/doris-dev:build-env-1.3
```

#### Step3: Download Doris source

Now you should be attached in docker environment.

You can download Doris source by release package or by git clone in image.

(If you already downloaded the source in your local host and map it to the image in Step2, you can skip this step.)

```
$ wget https://dist.apache.org/repos/dist/dev/incubator/doris/xxx.tar.gz
or
$ git clone https://github.com/apache/incubator-doris.git
```

#### Step4: Build Doris

Enter Doris source path and build Doris.

```
$ sh build.sh
```

After successfully building, it will install binary files in the directory `output/`.

### 4.2 For Linux OS

#### Prerequisites

You should install the following softwares:

```
GCC 10.2.1+, Oracle JDK 1.8+, Python 2.7+, Apache Maven 3.5+, CMake 3.19.2+, Flex 2.6.0+
```

Then set them to environment variable PATH and set JAVA_HOME.

If your GCC version is lower than 10.2.1, you can run:

```
sudo yum install -y devtoolset-10-gcc*
```

If devtoolset-10 is not found in current repo. Oracle has already rebuilt the devtoolset-10 packages. You can create
repo file `CentOS-SCLo-scl.ol.repo` in path `/etc/yum.repos.d/`:

```
[ol7_software_collections]
name=Software Collection packages for Oracle Linux 7 ($basearch)
baseurl=http://yum.oracle.com/repo/OracleLinux/OL7/SoftwareCollections/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
```

and then

```
wget http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-*
sudo yum install -y devtoolset-10-gcc*
```
Don't forget to set the path of GCC (e.g `/opt/rh/devtoolset-10/root/usr/bin`) to the environment variable PATH.

#### Compile and install

Run the following script, it will compile thirdparty libraries and build whole Doris.

```
sh build.sh
```

After successfully building, it will install binary files in the directory `output/`.
See [Compilation](https://github.com/apache/incubator-doris/blob/master/docs/en/installing/compilation.md) for details.

## 5. License Notice

Expand Down
5 changes: 4 additions & 1 deletion docs/.vuepress/sidebar/en.js
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,9 @@ module.exports = [
title: "Bast Practices",
directoryPath: "best-practices/",
children: [
"fe-load-balance"
"fe-load-balance",
"systemd",
"samples"
],
},
{
Expand All @@ -219,6 +221,7 @@ module.exports = [
"plugin-development-manual",
"spark-doris-connector",
"flink-doris-connector",
"datax",
{
title: "UDF",
directoryPath: "udf/",
Expand Down
5 changes: 4 additions & 1 deletion docs/.vuepress/sidebar/zh-CN.js
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,9 @@ module.exports = [
title: "最佳实践",
directoryPath: "best-practices/",
children: [
"fe-load-balance"
"fe-load-balance",
"systemd",
"samples"
],
},
{
Expand All @@ -221,6 +223,7 @@ module.exports = [
"plugin-development-manual",
"spark-doris-connector",
"flink-doris-connector",
"datax",
{
title: "UDF",
directoryPath: "udf/",
Expand Down
56 changes: 56 additions & 0 deletions docs/en/best-practices/samples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
{
"title": "Samples",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Samples

Doris provides a wealth of usage samples, which can help Doris users quickly get started to experience the features of Doris.

## Description

The sample codes are stored in the [`samples/`](https://github.com/apache/incubator-doris/tree/master/samples) directory of the Doris code base.

```
├── connect
├── doris-demo
├── insert
└── mini_load
```

* `connect/`

This catalog mainly shows the code examples of connecting Doris in various programming languages.

* `doris-demo/`

The code examples of the multiple functions of Doris are shown mainly in the form of Maven project. Such as spark-connector and flink-connector usage examples, integration with the Spring framework, Stream Load examples, and so on.

* `insert/`

This catalog shows some code examples of importing data through python or shell script calling Doris's Insert command.

* `miniload/`

This catalog shows the code example of calling mini load through python to import data. However, because the mini load function has been replaced by the stream load function, it is recommended to use the stream load function for data import.
31 changes: 31 additions & 0 deletions docs/en/best-practices/systemd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
{
"title": "Systemd",
"language": "zh-CN"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Systemd

The Systemd configuration file is provided in the Doris code base, which can help users control the start and stop of the Doris service in Linux.

Please go to [Code Base](https://github.com/apache/incubator-doris/tree/master/tools/systemd) to view the configuration file.
43 changes: 31 additions & 12 deletions extension/DataX/README → docs/en/extending-doris/datax.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
{
"title": "DataX doriswriter",
"language": "en"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
Expand All @@ -17,7 +24,21 @@ specific language governing permissions and limitations
under the License.
-->

## DataX Extension Directory
# DataX doriswriter

[DataX](https://github.com/alibaba/DataX) doriswriter plug-in, used to synchronize data from other data sources to Doris through DataX.

The plug-in uses Doris' Stream Load function to synchronize and import data. It needs to be used with DataX service.

## About DataX

DataX is an open source version of Alibaba Cloud DataWorks data integration, an offline data synchronization tool/platform widely used in Alibaba Group. DataX implements efficient data synchronization functions between various heterogeneous data sources including MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS), Hologres, DRDS, etc.

More details can be found at: `https://github.com/alibaba/DataX/`

## Usage

The code of DataX doriswriter plug-in can be found [here](https://github.com/apache/incubator-doris/tree/master/extension/DataX).

This directory is the doriswriter plug-in development environment of Alibaba DataX.

Expand All @@ -30,26 +51,26 @@ Because the doriswriter plug-in depends on some modules in the DataX code base,
This directory is the code directory of doriswriter, and this part of the code should be in the Doris code base.

The help doc can be found in `doriswriter/doc`

2. `init_env.sh`

The script mainly performs the following steps:

1. Git clone the DataX code base to the local
2. Softlink the `doriswriter/` directory to `DataX/doriswriter`.
3. Add `<module>doriswriter</module>` to the original `DataX/pom.xml`
4. Change httpclient version from 4.5 to 4.5.13 in DataX/core/pom.xml

> httpclient v4.5 can not handle redirect 307 correctly.
> httpclient v4.5 can not handle redirect 307 correctly.
After that, developers can enter `DataX/` for development. And the changes in the `DataX/doriswriter` directory will be reflected in the `doriswriter/` directory, which is convenient for developers to submit code.

### How to build

1. Run `init_env.sh`
2. Modify code of doriswriter in `DataX/doriswriter`
2. Modify code of doriswriter in `DataX/doriswriter` if you need.
3. Build doriswriter

1. Build doriswriter along:

`mvn clean install -pl plugin-rdbms-util,doriswriter -DskipTests`
Expand All @@ -61,11 +82,9 @@ Because the doriswriter plug-in depends on some modules in the DataX code base,
The output will be in `target/datax/datax/`.

> hdfsreader, hdfswriter and oscarwriter needs some extra jar packages. If you don't need to use these components, you can comment out the corresponding module in DataX/pom.xml.

4. Commit code of doriswriter in `doriswriter`
### About DataX
4. Commit code of doriswriter in `doriswriter` if you need.

DataX is an open source version of Alibaba Cloud DataWorks data integration, an offline data synchronization tool/platform widely used in Alibaba Group. DataX implements efficient data synchronization functions between various heterogeneous data sources including MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS), Hologres, DRDS, etc.
### Example

More details can be found at: `https://github.com/alibaba/DataX/`
For instructions on using the doriswriter plug-in, please refer to [here](https://github.com/apache/incubator-doris/blob/master/extension/DataX/doriswriter/doc/doriswriter.md).
57 changes: 57 additions & 0 deletions docs/zh-CN/best-practices/samples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
{
"title": "使用示例",
"language": "zh-CN"
}
---

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# 使用示例

Doris 代码库中提供了丰富的使用示例,能够帮助 Doris 用户快速上手体验 Doris 的功能。

## 示例说明

示例代码都存放在 Doris 代码库的 [`samples/`](https://github.com/apache/incubator-doris/tree/master/samples) 目录下。

```
.
├── connect
├── doris-demo
├── insert
└── mini_load
```

* `connect/`

该目录下主要展示了各个程序语言连接 Doris 的代码示例。

* `doris-demo/`

该目下主要以 Maven 工程的形式,展示了 Doris 多个功能的代码示例。如 spark-connector 和 flink-connector 的使用示例、与 Spring 框架集成的示例、Stream Load 导入示例等等。

* `insert/`

该目录展示了通过 python 或 shell 脚本调用 Doris 的 Insert 命令导入数据的一些代码示例。

* `miniload/`

该目录展示了通过 python 调用 mini load 进行数据导入的代码示例。但因为 mini load 功能已由 stream load 功能代替,建议使用 stream load 功能进行数据导入。
Loading

0 comments on commit 1a5b031

Please sign in to comment.