Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/en/use_case/dolphinscheduler_task_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
In the closed loop of machine learning applications from development to deployment, data processing, feature engineering, and model training often cost a lot of time and manpower. To facilitate AI applications development and deployment, we have developed the DolphinScheduler OpenMLDB Task, which integrates feature engineering into the workflow of DolphinScheduler to build an end-to-end MLOps workflow. This article will briefly introduce and demonstrate the operation process of the DolphinScheduler OpenMLDB Task.

```{seealso}
See [DolphinScheduler OpenMLDB Task Official Documentation](https://dolphinscheduler.apache.org/#/en-us/docs/3.1.2/guide/task/openmldb) for full details.
See [DolphinScheduler OpenMLDB Task Official Documentation](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/task/openmldb) for full details.
```

## Scenarios and Functions
Expand Down Expand Up @@ -77,19 +77,20 @@ If online predict test got errors, please check the log`/work/predict.log`.

**Start DolphinScheduler**

You can download the DolphinScheduler dev package prepared by us, in[dolphinscheduler-bin download link](http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz).
You can download the DolphinScheduler package in [official](https://dolphinscheduler.apache.org/zh-cn/download/3.1.5), or the mirror site prepared by us, in[dolphinscheduler-bin download link](http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-3.1.5-bin.tar.gz).

Start the DolphinScheduler standalone version. The steps are as follows. For more information, please refer to [Official Documentation](https://dolphinscheduler.apache.org/#/en-us/docs/3.1.2/guide/installation/standalone)。
Start the DolphinScheduler standalone version. The steps are as follows. For more information, please refer to [Official Documentation](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/installation/standalone)。
```
curl -SLO http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz
curl -SLO https://dlcdn.apache.org/dolphinscheduler/3.1.5/apache-dolphinscheduler-3.1.5-bin.tar.gz
# mirror: curl -SLO http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-3.1.5-bin.tar.gz
tar -xvzf apache-dolpSchedulerler-*-bin.tar.gz
cd apache-dolpSchedulerler-*-bin
sed -i s#/opt/soft/python#/usr/bin/python3#g bin/env/dolphinscheduler_env.sh
sh ./bin/dolpSchedulerler-daemon.sh start standalone-server
```

```{hint}
The OpenMLDB Task in higher version DolphinScheduler official releases(e.g. 3.1.2) has problems,can't work, please use the package which we provided. If you want the DolphinScheduler in higher version, ask us for the fix version.
The OpenMLDB Task in old version (< 3.1.2) has problems,can't work, please use the newer package(>=3.1.3). If you want the DolphinScheduler in old version, ask us for the fix version.

In higher version of DolphinScheduler, `bin/env/dolphinscheduler_env.sh` may be changed, we need to append `PYTHON_HOME` to it, run `echo "export PYTHON_HOME=/usr/bin/python3" >> bin/env/dolphinscheduler_env.sh`.

Expand Down Expand Up @@ -211,4 +212,4 @@ Restart the DolphinScheduler server(the metadata will be cleaned, you need to re
./bin/dolphinscheduler-daemon.sh start standalone-server
```

If you want to store the metadata,check [Pseudo-Cluster Deployment](https://dolphinscheduler.apache.org/#/en-us/docs/3.1.2/guide/installation/pseudo-cluster) to use the database.
If you want to store the metadata,check [Pseudo-Cluster Deployment](https://dolphinscheduler.apache.org/en-us/docs/3.1.5/guide/installation/pseudo-cluster) to use the database.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
在机器学习从开发到上线的业务闭环中,数据处理、特征开发、模型训练往往要耗费大量的时间和人力。为给 AI 模型构建及应用上线提供便利,简化机器学习建模工程化的流程,我们开发了 DolphinScheduler OpenMLDB Task,将特征平台能力融入 DolphinScheduler 的工作流,链接特征工程与调度环节,打造端到端 MLOps 工作流,帮助开发者专注于业务价值的探索。本文将为大家简要介绍并实际演示 DolphinScheduler OpenMLDB Task 的操作流程。

```{seealso}
详细的OpenMLDB Task信息,请参考[DolphinScheduler OpenMLDB Task 官方文档](https://dolphinscheduler.apache.org/#/zh-cn/docs/3.1.2/guide/task/openmldb)。
详细的OpenMLDB Task信息,请参考[DolphinScheduler OpenMLDB Task 官方文档](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/task/openmldb)。
```

## 场景和功能
Expand Down Expand Up @@ -74,25 +74,27 @@ python3 predict_server.py --no-init > predict.log 2>&1 &

**下载并运行 DolphinScheduler**

DolphinScheduler 支持 OpenMLDB Task 的版本,我们直接提供了一个可供下载版本,下载地址[dolphinscheduler-bin](http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz)
DolphinScheduler 支持 OpenMLDB Task 的版本为3.1.3及以上,本文使用3.1.5,可以到[官方](https://dolphinscheduler.apache.org/zh-cn/download/3.1.5)下载,或使用镜像网站下载

启动 DolphinScheduler standalone,步骤如下,更多请参考[官方文档](https://dolphinscheduler.apache.org/#/zh-cn/docs/3.1.2/guide/installation/standalone)。
启动 DolphinScheduler standalone,步骤如下,更多请参考[官方文档](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/installation/standalone)。

```
curl -SLO http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz
# 官方
curl -SLO https://dlcdn.apache.org/dolphinscheduler/3.1.5/apache-dolphinscheduler-3.1.5-bin.tar.gz
# 镜像 curl -SLO http://openmldb.ai/download/dolphinschduler-task/apache-dolphinscheduler-dev-3.1.5-bin.tar.gz
tar -xvzf apache-dolphinscheduler-*-bin.tar.gz
cd apache-dolphinscheduler-*-bin
sed -i s#/opt/soft/python#/usr/bin/python3#g bin/env/dolphinscheduler_env.sh
./bin/dolphinscheduler-daemon.sh start standalone-server
```

```{hint}
目前DolphinScheduler的官方release版本的OpenMLDB Task存在问题,无法直接使用,请使用我们提供的下载版本。如需要使用更新版本的DolphinScheduler,可联系我们提供对应版本的OpenMLDB Task修复版。
DolphinScheduler的官方release版本中,<3.1.3的旧版本里OpenMLDB Task存在问题,无法直接使用,如果使用旧版本,可联系我们提供对应版本的OpenMLDB Task修复版。3.1.3及以后的版本已经修复了这个问题,可以使用官方release版本

在更高版本的DolphinScheduler中,`bin/env/dolphinscheduler_env.sh`可能变化,需要追加配置`PYTHON_HOME`,可使用命令`echo "export PYTHON_HOME=/usr/bin/python3" >> bin/env/dolphinscheduler_env.sh`修改。
在其他版本的DolphinScheduler中,`bin/env/dolphinscheduler_env.sh`可能变化,如果`bin/env/dolphinscheduler_env.sh`中不存在`PYTHON_HOME`,需要追加配置,可使用命令`echo "export PYTHON_HOME=/usr/bin/python3" >> bin/env/dolphinscheduler_env.sh`修改。
```

浏览器访问地址 http://localhost:12345/dolphinscheduler/ui 即可登录系统UI(如果跨主机访问,请使用公网IP)。默认的用户名和密码是 admin/dolphinscheduler123。
浏览器访问地址 http://localhost:12345/dolphinscheduler/ui 即可登录系统UI(默认配置即可跨主机访问,但需确保IP连接畅通)。默认的用户名和密码是 admin/dolphinscheduler123。

```{note}
DolphinScheduler 的 worker server 需要 OpenMLDB Python SDK, DolphinScheduler standalone 的 worker 即本机,所以只需在本机安装OpenMLDB Python SDK。我们的OpenMLDB镜像中已经安装了。如果你在别的环境中,请安装openmldb sdk:`pip3 install openmldb`。
Expand Down Expand Up @@ -206,4 +208,4 @@ curl -X POST 127.0.0.1:8881/predict -d '{"ip": 114904,
./bin/dolphinscheduler-daemon.sh start standalone-server
```

如果想要保留元数据,请参考[伪集群部署](https://dolphinscheduler.apache.org/#/zh-cn/docs/3.1.2/guide/installation/pseudo-cluster)配置数据库。
如果想要保留元数据,请参考[伪集群部署](https://dolphinscheduler.apache.org/zh-cn/docs/3.1.5/guide/installation/pseudo-cluster)配置数据库。