Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions doc/auth_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Design: SQLFlow Authentication and Authorization

## Concepts

Authentication is to identify the
user. Authorization
is to grant privileges to a user like accessing some system
functionalities.

SQLFlow bridges SQL engines and
machine learning systems. To execute a job,
the SQLFlow server needs permissions to access databases and to submit machine learning jobs to
clusters like Kubernetes.

When we deploy SQLFlow server as a Kubernetes service with horizontal auto-scaling enabled, many clients
might connect to each SQLFlow server instance. For authetication and authorization, we must securely store a mapping
from the user's ID to the user's credentials for accessing the database and the
cluster. With authentication and authorization, we will be able to implement *sessions*, which means that each SQL statement in a SQL program might be handled by different SQLFlow server instances in the Kubernetes service; however, the user wouldn't notice that.

Authorization is not a too much a challenge because we can rely on
SQL engines and training clusters, which denies requests if the user
have no access. In this document, we focus on authentication of SQLFlow users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we should clarify the concept of the "client" in this section. A client of SQLFlow server might be the SQLFlow magic command, which is an extension to Jupyter Notebook server, or a Windows-native or macOS-native GUI program. It looks to me that we introduce an authentication server because we want to support both kinds of clients?

## Design

The [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) is the central web
page for users to work on. JupyterHub can support many well-known authorization
and authentication [methods](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators)
so that it will be easy to adapt the full solution to cloud environments
or on-premise.

### Session

A server-side "session" is needed to store credentials for each client to access
the database and submitting jobs. The session can be defined as:

```go
type Session struct {
Token int64 // User token granted after login
ClientEndpoint string // ip:port from the client
DBConnStr string // mysql://AK:SK@127.0.0.1:3306
// cached connection to database for current session, can point to a global connecion map
DBConn *sql.DB
K8SAK string // AK or username for accessing kubernetes
K8SSK string // SK or secret for accessing kubernetes
}
Copy link
Collaborator

@weiguoz weiguoz Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the expired time to eliminate those zombie sessions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should! Thanks!

```

**Note:** that SQLFlow should be dealing with three kinds of services:

- SQLFlow RPC service itself
- Database service that stores the training data, e.g. MaxCompute
- A training cluster that runs the SQLFlow training job, e.g. Kubernetes

The token will act as the unique id of the session. The session object
should be expired within some time and deleted on the server memory.

The Database connection string also contains credential information
follow the format like `mysql://AK:SK@127.0.0.1:3306`.

To submit to clusters like Kubrenetes, we also need to store credentials
to access Kubernetes API server, so `K8SAK, K8SSK` is also stored in
the session.

We want to make sure that SQLFlow servers are stateless so that we can
deploy it on any cluster that does auto fail-over and auto-scaling. In
that case, we store session data into a reliable storage service like
[etcd](https://github.com/etcd-io/etcd).

### Authentication of SQLFlow Server

The below figure demonstrates overall workflow for authorization and
authentication.

<img src="figures/sqlflow_auth.png">

Users can access the JupyterHub web page using their own username and password.
The user's identity will be verified by the [SSO](https://en.wikipedia.org/wiki/Single_sign-on)
service or **any** other authentication methods. Then the JupyterHub
is responsible to fetch current user's "AK/SK"s (typically securely encoded strings)
for accessing databases and the Kubernetes cluster. "AK/SK" for accessing database and
the Kubernetes cluster may not be the same. The mapping from the user's ID to the user's
"AK/SK" is stored in the "Mapping Service", which is an HTTPS RESTful service.

Then JupyterHub will spawn the Jupyter
Notebook instances for each user and set the user's login token and "AK/SK" for
the Notebook instance to some secure storage so that the SQLFlow magic command plugin
can read it.

Then we create a session on the SQLFlow server for the current user by calling below RPC:

```proto
service SQLFlow {
rpc CreateSession (Session) returns (Response);
}

message Session {
string token = 1;
string client_endpoint = 2;
string db_conn_str = 3;
string k8s_ak = 4;
string k8s_sk = 5;
}
```

When the SQLFlow server receives this RPC call, it should store the session
on the etcd cluster with a session expire time, see:
https://help.compose.com/docs/etcd-using-etcd3-features#section-leases

After that, the login procedure is finished. Then any SQL statement typed in
the current user's Notebook instance, it will be translated to jobs using
the current users' AK/SK and submit to the user's namespace on Kubernetes.

If one user is already logged in, and the Jupyter Notebook instance is still
alive, he or she can directly use the Notebook to run any job.

If the Notebook instance crashed, the JupyterHub should be able to re-create
the instance and set the user's token and "AK/SK".

If one user's session is expired, the JupyterHub should be able to refresh the
login and fetch the "AK/SK" again, then re-create the Notebook instance.


## Conclusion

To make SQLFlow server production ready, supporting serving multiple clients on one
SQLFlow server instance is necessary, Authentication and session management should
be implemented.

For different environments like on cloud or on-premise, we may need to implement
different "authenticators" for JupyterHub to adapt them.
Binary file added doc/figures/sqlflow_auth.graffle
Binary file not shown.
Binary file added doc/figures/sqlflow_auth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.