Skip to content

Commit 95cc608

Browse files
typhoonzerowangkuiyi
authored andcommitted
add auth and session design (#501)
* add auth and session design * update * update designs * Wording * follow comments * update
1 parent 2e6d511 commit 95cc608

File tree

3 files changed

+131
-0
lines changed

3 files changed

+131
-0
lines changed

doc/auth_design.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Design: SQLFlow Authentication and Authorization
2+
3+
## Concepts
4+
5+
Authentication is to identify the
6+
user. Authorization
7+
is to grant privileges to a user like accessing some system
8+
functionalities.
9+
10+
SQLFlow bridges SQL engines and
11+
machine learning systems. To execute a job,
12+
the SQLFlow server needs permissions to access databases and to submit machine learning jobs to
13+
clusters like Kubernetes.
14+
15+
When we deploy SQLFlow server as a Kubernetes service with horizontal auto-scaling enabled, many clients
16+
might connect to each SQLFlow server instance. For authetication and authorization, we must securely store a mapping
17+
from the user's ID to the user's credentials for accessing the database and the
18+
cluster. With authentication and authorization, we will be able to implement *sessions*, which means that each SQL statement in a SQL program might be handled by different SQLFlow server instances in the Kubernetes service; however, the user wouldn't notice that.
19+
20+
Authorization is not a too much a challenge because we can rely on
21+
SQL engines and training clusters, which denies requests if the user
22+
have no access. In this document, we focus on authentication of SQLFlow users.
23+
24+
## Design
25+
26+
The [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) is the central web
27+
page for users to work on. JupyterHub can support many well-known authorization
28+
and authentication [methods](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators)
29+
so that it will be easy to adapt the full solution to cloud environments
30+
or on-premise.
31+
32+
### Session
33+
34+
A server-side "session" is needed to store credentials for each client to access
35+
the database and submitting jobs. The session can be defined as:
36+
37+
```go
38+
type Session struct {
39+
Token int64 // User token granted after login
40+
ClientEndpoint string // ip:port from the client
41+
DBConnStr string // mysql://AK:SK@127.0.0.1:3306
42+
// cached connection to database for current session, can point to a global connecion map
43+
DBConn *sql.DB
44+
K8SAK string // AK or username for accessing kubernetes
45+
K8SSK string // SK or secret for accessing kubernetes
46+
}
47+
```
48+
49+
**Note:** that SQLFlow should be dealing with three kinds of services:
50+
51+
- SQLFlow RPC service itself
52+
- Database service that stores the training data, e.g. MaxCompute
53+
- A training cluster that runs the SQLFlow training job, e.g. Kubernetes
54+
55+
The token will act as the unique id of the session. The session object
56+
should be expired within some time and deleted on the server memory.
57+
58+
The Database connection string also contains credential information
59+
follow the format like `mysql://AK:SK@127.0.0.1:3306`.
60+
61+
To submit to clusters like Kubrenetes, we also need to store credentials
62+
to access Kubernetes API server, so `K8SAK, K8SSK` is also stored in
63+
the session.
64+
65+
We want to make sure that SQLFlow servers are stateless so that we can
66+
deploy it on any cluster that does auto fail-over and auto-scaling. In
67+
that case, we store session data into a reliable storage service like
68+
[etcd](https://github.com/etcd-io/etcd).
69+
70+
### Authentication of SQLFlow Server
71+
72+
The below figure demonstrates overall workflow for authorization and
73+
authentication.
74+
75+
<img src="figures/sqlflow_auth.png">
76+
77+
Users can access the JupyterHub web page using their own username and password.
78+
The user's identity will be verified by the [SSO](https://en.wikipedia.org/wiki/Single_sign-on)
79+
service or **any** other authentication methods. Then the JupyterHub
80+
is responsible to fetch current user's "AK/SK"s (typically securely encoded strings)
81+
for accessing databases and the Kubernetes cluster. "AK/SK" for accessing database and
82+
the Kubernetes cluster may not be the same. The mapping from the user's ID to the user's
83+
"AK/SK" is stored in the "Mapping Service", which is an HTTPS RESTful service.
84+
85+
Then JupyterHub will spawn the Jupyter
86+
Notebook instances for each user and set the user's login token and "AK/SK" for
87+
the Notebook instance to some secure storage so that the SQLFlow magic command plugin
88+
can read it.
89+
90+
Then we create a session on the SQLFlow server for the current user by calling below RPC:
91+
92+
```proto
93+
service SQLFlow {
94+
rpc CreateSession (Session) returns (Response);
95+
}
96+
97+
message Session {
98+
string token = 1;
99+
string client_endpoint = 2;
100+
string db_conn_str = 3;
101+
string k8s_ak = 4;
102+
string k8s_sk = 5;
103+
}
104+
```
105+
106+
When the SQLFlow server receives this RPC call, it should store the session
107+
on the etcd cluster with a session expire time, see:
108+
https://help.compose.com/docs/etcd-using-etcd3-features#section-leases
109+
110+
After that, the login procedure is finished. Then any SQL statement typed in
111+
the current user's Notebook instance, it will be translated to jobs using
112+
the current users' AK/SK and submit to the user's namespace on Kubernetes.
113+
114+
If one user is already logged in, and the Jupyter Notebook instance is still
115+
alive, he or she can directly use the Notebook to run any job.
116+
117+
If the Notebook instance crashed, the JupyterHub should be able to re-create
118+
the instance and set the user's token and "AK/SK".
119+
120+
If one user's session is expired, the JupyterHub should be able to refresh the
121+
login and fetch the "AK/SK" again, then re-create the Notebook instance.
122+
123+
124+
## Conclusion
125+
126+
To make SQLFlow server production ready, supporting serving multiple clients on one
127+
SQLFlow server instance is necessary, Authentication and session management should
128+
be implemented.
129+
130+
For different environments like on cloud or on-premise, we may need to implement
131+
different "authenticators" for JupyterHub to adapt them.

doc/figures/sqlflow_auth.graffle

11.2 KB
Binary file not shown.

doc/figures/sqlflow_auth.png

56.4 KB
Loading

0 commit comments

Comments
 (0)