|
| 1 | +# Design: SQLFlow Authentication and Authorization |
| 2 | + |
| 3 | +## Concepts |
| 4 | + |
| 5 | +Authentication is to identify the |
| 6 | +user. Authorization |
| 7 | +is to grant privileges to a user like accessing some system |
| 8 | +functionalities. |
| 9 | + |
| 10 | +SQLFlow bridges SQL engines and |
| 11 | +machine learning systems. To execute a job, |
| 12 | +the SQLFlow server needs permissions to access databases and to submit machine learning jobs to |
| 13 | +clusters like Kubernetes. |
| 14 | + |
| 15 | +When we deploy SQLFlow server as a Kubernetes service with horizontal auto-scaling enabled, many clients |
| 16 | +might connect to each SQLFlow server instance. For authetication and authorization, we must securely store a mapping |
| 17 | +from the user's ID to the user's credentials for accessing the database and the |
| 18 | +cluster. With authentication and authorization, we will be able to implement *sessions*, which means that each SQL statement in a SQL program might be handled by different SQLFlow server instances in the Kubernetes service; however, the user wouldn't notice that. |
| 19 | + |
| 20 | +Authorization is not a too much a challenge because we can rely on |
| 21 | +SQL engines and training clusters, which denies requests if the user |
| 22 | +have no access. In this document, we focus on authentication of SQLFlow users. |
| 23 | + |
| 24 | +## Design |
| 25 | + |
| 26 | +The [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) is the central web |
| 27 | +page for users to work on. JupyterHub can support many well-known authorization |
| 28 | +and authentication [methods](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators) |
| 29 | +so that it will be easy to adapt the full solution to cloud environments |
| 30 | +or on-premise. |
| 31 | + |
| 32 | +### Session |
| 33 | + |
| 34 | +A server-side "session" is needed to store credentials for each client to access |
| 35 | +the database and submitting jobs. The session can be defined as: |
| 36 | + |
| 37 | +```go |
| 38 | +type Session struct { |
| 39 | + Token int64 // User token granted after login |
| 40 | + ClientEndpoint string // ip:port from the client |
| 41 | + DBConnStr string // mysql://AK:SK@127.0.0.1:3306 |
| 42 | + // cached connection to database for current session, can point to a global connecion map |
| 43 | + DBConn *sql.DB |
| 44 | + K8SAK string // AK or username for accessing kubernetes |
| 45 | + K8SSK string // SK or secret for accessing kubernetes |
| 46 | +} |
| 47 | +``` |
| 48 | + |
| 49 | +**Note:** that SQLFlow should be dealing with three kinds of services: |
| 50 | + |
| 51 | +- SQLFlow RPC service itself |
| 52 | +- Database service that stores the training data, e.g. MaxCompute |
| 53 | +- A training cluster that runs the SQLFlow training job, e.g. Kubernetes |
| 54 | + |
| 55 | +The token will act as the unique id of the session. The session object |
| 56 | +should be expired within some time and deleted on the server memory. |
| 57 | + |
| 58 | +The Database connection string also contains credential information |
| 59 | +follow the format like `mysql://AK:SK@127.0.0.1:3306`. |
| 60 | + |
| 61 | +To submit to clusters like Kubrenetes, we also need to store credentials |
| 62 | +to access Kubernetes API server, so `K8SAK, K8SSK` is also stored in |
| 63 | +the session. |
| 64 | + |
| 65 | +We want to make sure that SQLFlow servers are stateless so that we can |
| 66 | +deploy it on any cluster that does auto fail-over and auto-scaling. In |
| 67 | +that case, we store session data into a reliable storage service like |
| 68 | +[etcd](https://github.com/etcd-io/etcd). |
| 69 | + |
| 70 | +### Authentication of SQLFlow Server |
| 71 | + |
| 72 | +The below figure demonstrates overall workflow for authorization and |
| 73 | +authentication. |
| 74 | + |
| 75 | +<img src="figures/sqlflow_auth.png"> |
| 76 | + |
| 77 | +Users can access the JupyterHub web page using their own username and password. |
| 78 | +The user's identity will be verified by the [SSO](https://en.wikipedia.org/wiki/Single_sign-on) |
| 79 | +service or **any** other authentication methods. Then the JupyterHub |
| 80 | +is responsible to fetch current user's "AK/SK"s (typically securely encoded strings) |
| 81 | +for accessing databases and the Kubernetes cluster. "AK/SK" for accessing database and |
| 82 | +the Kubernetes cluster may not be the same. The mapping from the user's ID to the user's |
| 83 | +"AK/SK" is stored in the "Mapping Service", which is an HTTPS RESTful service. |
| 84 | + |
| 85 | +Then JupyterHub will spawn the Jupyter |
| 86 | +Notebook instances for each user and set the user's login token and "AK/SK" for |
| 87 | +the Notebook instance to some secure storage so that the SQLFlow magic command plugin |
| 88 | +can read it. |
| 89 | + |
| 90 | +Then we create a session on the SQLFlow server for the current user by calling below RPC: |
| 91 | + |
| 92 | +```proto |
| 93 | +service SQLFlow { |
| 94 | + rpc CreateSession (Session) returns (Response); |
| 95 | +} |
| 96 | +
|
| 97 | +message Session { |
| 98 | + string token = 1; |
| 99 | + string client_endpoint = 2; |
| 100 | + string db_conn_str = 3; |
| 101 | + string k8s_ak = 4; |
| 102 | + string k8s_sk = 5; |
| 103 | +} |
| 104 | +``` |
| 105 | + |
| 106 | +When the SQLFlow server receives this RPC call, it should store the session |
| 107 | +on the etcd cluster with a session expire time, see: |
| 108 | +https://help.compose.com/docs/etcd-using-etcd3-features#section-leases |
| 109 | + |
| 110 | +After that, the login procedure is finished. Then any SQL statement typed in |
| 111 | +the current user's Notebook instance, it will be translated to jobs using |
| 112 | +the current users' AK/SK and submit to the user's namespace on Kubernetes. |
| 113 | + |
| 114 | +If one user is already logged in, and the Jupyter Notebook instance is still |
| 115 | +alive, he or she can directly use the Notebook to run any job. |
| 116 | + |
| 117 | +If the Notebook instance crashed, the JupyterHub should be able to re-create |
| 118 | +the instance and set the user's token and "AK/SK". |
| 119 | + |
| 120 | +If one user's session is expired, the JupyterHub should be able to refresh the |
| 121 | +login and fetch the "AK/SK" again, then re-create the Notebook instance. |
| 122 | + |
| 123 | + |
| 124 | +## Conclusion |
| 125 | + |
| 126 | +To make SQLFlow server production ready, supporting serving multiple clients on one |
| 127 | +SQLFlow server instance is necessary, Authentication and session management should |
| 128 | +be implemented. |
| 129 | + |
| 130 | +For different environments like on cloud or on-premise, we may need to implement |
| 131 | +different "authenticators" for JupyterHub to adapt them. |
0 commit comments