Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions doc/k8s/sqlflow-all-in-one.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: sqlflow-all-in-one
spec:
selector:
matchLabels:
app: sqlflow-all-in-one
strategy:
type: Recreate
template:
metadata:
labels:
app: sqlflow-all-in-one
spec:
volumes:
- name: shared-data
emptyDir: {}
containers:
- image: sqlflow/sqlflow:latest
name: sqlflow-all-in-one
imagePullPolicy: Always
env:
- name: SQLFLOW_MYSQL_HOST
value: "127.0.0.1"
- name: SQLFLOW_MYSQL_PORT
value: "3306"
command: ["bash"]
args: ["start.sh"]
ports:
- containerPort: 8888
hostPort: 8888
name: sqlflow

34 changes: 34 additions & 0 deletions doc/k8s/sqlflow-jhub.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: sqlflow-jhub
spec:
selector:
matchLabels:
app: sqlflow-jhub
strategy:
type: Recreate
template:
metadata:
labels:
app: sqlflow-jhub
spec:
volumes:
- name: shared-data
emptyDir: {}
containers:
- image: yancey1989/sqlflowhub
name: sqlflow-jhub
imagePullPolicy: Always
command: ["jupyterhub"]
args: ["--config", "/etc/jhub/jupyterhub_config.py"]
env:
- name: SQLFLOW_DATASOURCE
value: "mysql://tcp@(10.102.193.217:3306)/?maxAllowedPacket=0"
- name: SQLFLOW_SERVER
value: "10.103.140.131:50051"
ports:
- containerPort: 8000
hostPort: 8000
name: sqlflow-jhub

47 changes: 23 additions & 24 deletions doc/k8s/sqlflow-mysql.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,41 +5,40 @@ metadata:
spec:
selector:
matchLabels:
app: sqlflow
app: sqlflow-mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: sqlflow
app: sqlflow-mysql
spec:
volumes:
- name: shared-data
emptyDir: {}
containers:
- image: sqlflow/sqlflow:latest
- image: sqlflow/sqlflow
name: mysql
imagePullPolicy: Always
env:
- name: SQLFLOW_MYSQL_HOST
value: "0.0.0.0"
- name: SQLFLOW_MYSQL_PORT
value: "3306"
command: ["bash"]
args: ["start.sh", "mysql"]
volumeMounts:
- name: shared-data
mountPath: /var/run/mysqld/
- image: sqlflow/sqlflow:latest
name: sqlflow-server
volumeMounts:
- name: shared-data
mountPath: /var/run/mysqld/
imagePullPolicy: Always
command: ["bash"]
args: ["start.sh", "sqlflow-server"]
- image: sqlflow/sqlflow:latest
name: notebook
imagePullPolicy: Always
command: ["bash"]
args: ["start.sh", "sqlflow-notebook"]
ports:
- containerPort: 8888
hostPort: 8888
name: sqlflow
- containerPort: 3306
name: mysql
---

apiVersion: v1
kind: Service
metadata:
name: sqlflow-mysql
labels:
app: sqlflow-mysql
spec:
ports:
- port: 3306
protocol: TCP
selector:
app: sqlflow-mysql
40 changes: 40 additions & 0 deletions doc/k8s/sqlflow-server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: sqlflow-server
spec:
replicas: 3
selector:
matchLabels:
app: sqlflow-server
strategy:
type: Recreate
template:
metadata:
labels:
app: sqlflow-server
spec:
containers:
- image: sqlflow/sqlflow
name: sqlflow-server
imagePullPolicy: Always
command: ["sqlflowserver"]
args: ["--enable-session"]
ports:
- containerPort: 50051
name: sqlflow-server

---

apiVersion: v1
kind: Service
metadata:
name: sqlflow-server
labels:
app: sqlflow-server
spec:
ports:
- port: 50051
protocol: TCP
selector:
app: sqlflow-server
10 changes: 10 additions & 0 deletions doc/k8s/sqlflowhub/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM sqlflow/sqlflow

RUN mkdir -p /etc/jhub && \
bash -c "source /miniconda/bin/activate sqlflow-dev && \
conda install -c conda-forge jupyterhub && \
conda install notebook"

COPY jupyterhub_config.py /etc/jhub/jupyterhub_config.py

CMD ["jupyterhub", "--config", "/etc/jhub/jupyterhub_config.py"]
22 changes: 22 additions & 0 deletions doc/k8s/sqlflowhub/jupyterhub_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright 2019 The SQLFlow Authors. All rights reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Configuration file for jupyterhub.
# shutdown the server after no activity for an hour
import os
c.NotebookApp.shutdown_no_activity_timeout = 60 * 60
c.LocalProcessSpawner.environment = {
"SQLFLOW_DATASOURCE": "mysql://root:root@tcp(%s:%s)/?maxAllowedPacket=0"%(os.getenv("SQLFLOW_MYSQL_SERVICE_HOST", ""), os.getenv("SQLFLOW_MYSQL_SERVICE_PORT", "3306")),
"SQLFLOW_SERVER": "%s:%s" % (os.getenv("SQLFLOW_SERVER_SERVICE_HOST", ""), os.getenv("SQLFLOW_SERVER_SERVICE_PORT", ""))
}
c.LocalProcessSpawner.cmd = ["/miniconda/envs/sqlflow-dev/bin/jupyterhub-singleuser"]
113 changes: 92 additions & 21 deletions doc/run_on_kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ This is a tutorial on how to run SQLFlow on Kubernetes, and this tutorial will d
- A MySQL server instance with some example data loaded,
- The SQLFlow gRPC server, and
- The Jupyter Notebook server with SQLFlow magic command installed.
- The JupyterHub which can serve multiple Notebook server for various users.

Then you can run the SQLFlow query in the Jupyter notebook on your web browser.
There are two sections in this tutorial:

- [Deploy the All-in-One SQLFlow](#deploy-the-sqlflow-all-in-one) deployed the SQLFlow on Kubernetes quickly.
- [Deploy the SQLFlow Hub](#deploy-the-sqlflow-hub) deployed an SQLFlow cluster and a JupyterHub server which can serve Notebook server instances for users.

## Prerequisites

Expand All @@ -17,16 +21,14 @@ to interact with the Kubernetes cluster.
1. Make sure the Kubernetes nodes can pull the official SQLFlow Docker image [sqlflow/sqlflow:latest] or your [custom
Docker image](/doc/build.md).

## Deploy the SQLFlow Components
## Deploy the All-in-One SQLFlow

1. Deploy the SQLFlow Pod on Kubernetes
``` bash
> kubectl create -f k8s/sqlflow-mysql.yaml
```
The above command starts a Pod with three containers, they running the `sqlflow/sqlflow:latest` Docker image,
one container runs the MySQL server instance and the other two run the SQLFlow gRPC server and the
Jupyter Notebook server. You can also use your custom Docker image by editing the `image` fields of the yaml
file [k8s/sqlflow-mysql.yaml](/doc/k8s/sqlflow-mysql.yaml):
The above command deploys a Pod, a MySQL server instance, a SQLFlow gRPC server and the Jupyter Notebook server runs in this Pod. You can also use
your custom Docker image by editting the `image` field of the yaml file: [k8s/sqlflow-all-in-one.yaml](/doc/k8s/sqlflow-all-in-one.yaml)
``` yaml
spec:
...
Expand All @@ -35,24 +37,15 @@ Docker image](/doc/build.md).
```

1. Testing your SQLFlow setup
You can find a Pod on Kubernetes with the prefix `sqlflow-mysql-*`:
You can find a Pod on Kubernetes which name is `sqlflow-all-in-one-<POD-ID>`:
``` bash
> kubectl get pods
NAME READY STATUS RESTARTS AGE
NAME READY STATUS RESTARTS AGE
sqlflow-mysql-77f8674899-dv269 2/2 Running 0 75m
```
The logs of the two containers similar to:
``` bash
> kubectl logs sqlflow-mysql-77f8674899-dv269 mysql
* Starting MySQL database server mysqld
...done.
> kubectl logs sqlflow-mysql-77f8674899-dv269 sqlflow
Connect to the datasource mysql://root:root@tcp(10.100.73.238:3306)/?maxAllowedPacket=0
2019/05/30 09:57:55 Server Started at :50051
sqlflow-all-in-one-9b57566c9-8xkpk 1/1 Running 0 60s
```

## Running your Query in SQLFlow
### Running your Query in SQLFlow

1. Copy the node IP of the sqlflow Pod on minikube as the follows command:
``` bash
Expand All @@ -63,8 +56,86 @@ Docker image](/doc/build.md).
using`kubectl get pods -o wide`:
``` bash
> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sqlflow-mysql-77f8674899-dv269 2/2 Running 0 9s 172.17.0.4 minikube <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sqlflow-all-in-one-9b57566c9-8xkpk 1/1 Running 0 24s 172.17.0.9 minikube <none> <none>
```

1. Open a web browser and go to '<node-ip>:8888', you can find the [SQLFlow example](/example/jupyter/example.ipynb) in the Jupyter notebook file lists.

## Deploy the SQLFlow Hub

This section will deploys
SQLFlow Hub using JupyterHub to serve Jupyter notebook for multiple users,
and easy to scale up/down the SQLFlow gRPC server according to workload.

1. Build the SQLFlow Hub Docker image and push to a registry server that the Kubernetes Note can access it.
``` bash
$ cd k8s/sqlflowhub
$ docker build -t <your-repo>/sqlflowhub .
$ docker push <your-repo>/sqlflowhub
```

1. Deploy the MySQL, SQLFlow gRPC server and JupyterHub step by step:

``` bash
kubectl create -f k8s/sqlflow-mysql.yaml
kubectl create -f k8s/sqlflow-server.yaml
kubectl create -f k8s/sqlflow-jhub.yaml
```

**NOTE**: Should grant all the remote hosts can access to the MySQL server if you want to use the custorm MySQL Docker image, the grant command like:
``` text
GRANT ALL PRIVILEGES ON *.* TO 'root'@'' IDENTIFIED BY 'root' WITH GRANT OPTION;
```

1. Check the SQLFlow Pods, you can find:
- A MySQL Pod named `sqlflow-mysql-*`.
- A JupyterHub Pod named `sqlflow-jhub-*`.
- 3 SQLFlow gRPC server Pods named `sqlflow-server-*`, and it's easy to scale up/down the replica count by modifying the `replicas` field of the yaml file: [k8s/sqlflow-server.yaml](/doc/k8s/sqlflow-server.yaml).
``` bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
sqlflow-jhub-78f96dcf88-sbvt6 1/1 Running 0 4m13s
sqlflow-mysql-55db79fd98-nhfjp 1/1 Running 0 4h8m
sqlflow-server-7444b4466d-frbcn 1/1 Running 0 4h
sqlflow-server-7444b4466d-h5w9c 1/1 Running 0 4h
sqlflow-server-7444b4466d-kndwx 1/1 Running 0 4h
```

1. Check the SQLFlow Service, so the Notebook server can connect them across their ClusterIP and Port:
- A MySQL Service named `sqlflow-mysql`, and
- An SQLFlow server Service named `sqlflow-server`

``` bash
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 29d
sqlflow-mysql ClusterIP 10.102.193.217 <none> 3306/TCP 6h4m
sqlflow-server ClusterIP 10.102.65.39 <none> 50051/TCP 5h56m
```

### Login the JupyterHub

JupyterHub using the PAMAuthenticator as the default authenticate method. the [PAM](https://en.wikipedia.org/wiki/Linux_PAM) can authenticate the system users with their username and password. You can find more information on [authenticators-users-basics](https://jupyterhub.readthedocs.io/en/stable/getting-started/authenticators-users-basics.html), and other authenticator methods from [here](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators)

Next, please do as the following steps to create a user on the system and login on the Jupyterhub:

1. List the Pods and execute into the `sqlflow-jhub` Pod
``` bash
$ kubectl get po
NAME READY STATUS RESTARTS AGE
sqlflow-jhub-78f96dcf88-gp8dg 1/1 Running 0 26m
sqlflow-mysql-55db79fd98-nhfjp 1/1 Running 0 51m
...
$ kubectl exec -it sqlflow-jhub-78f96dcf88-gp8dg bash
```

1. Create a user and set a password by the `adduser` command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this additional user creation step, you may wanna config dummyauthenticator by https://github.com/jupyterhub/kubespawner/blob/master/jupyterhub_config.py#L32 .

Please remember to install it using pip install jupyterhub-dummyauthenticator
ref

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can use the DummyAuthenticator to demonstrate the SQLFlow on k8s, seems PAMAuthenticator is more similar to the production environment.

``` bash
$ adduser sqlflow -q --gecos "" --home /home/sqlflow
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
```

1. Open a web browser and go to '<node-ip>:8888', you can find the [SQLFlow example](/example/jupyter/example.ipynb) in the Jupyter notebook file lists.
1. Open your browser and go to `<node-ip>:8000` and log in by the username/password as the above step. If you passed the authenticator, the JupyterHub would launch a Notebook server for your account, and then you can run your SQLFlow query in it.
2 changes: 1 addition & 1 deletion scripts/image_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ mysql-connector-python \
impyla \
pyodps \
jupyter \
sqlflow==0.1.0 \
sqlflow==0.2.0 \
pre-commit \
tornado==4.5.3 \
${PIP_ADD_PACKAGES}
Expand Down
4 changes: 4 additions & 0 deletions scripts/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ function setup_mysql() {
for f in /docker-entrypoint-initdb.d/*; do
cat $f | mysql -uroot -proot --host ${SQLFLOW_MYSQL_HOST} --port ${SQLFLOW_MYSQL_PORT}
done
# Grant all privileges to all the remote hosts so that the sqlflow server can be scaled to more than on replicas.
# NOTE: should notice this authorization on the production environment, it's not safe.
mysql -uroot -proot -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'' IDENTIFIED BY 'root' WITH GRANT OPTION;"

}

function setup_sqlflow_server() {
Expand Down