-
Notifications
You must be signed in to change notification settings - Fork 92
Rearchitecture #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rearchitecture #195
Conversation
Slowly migrating off tornado.
- Pare back functionality to allow rebuilding incrementally. - Move from tornado to aiohttp - Introduce backend abstraction
- Add user cache - Move all auth handling to authenticator class
Fuly remove tornado from the codebase, rely on asyncio and aiohttp instead. Also updates tests (still many many tests to go).
Some configurables should be global (e.g. timeouts, etc...) while others are cluster specific (e.g. ``worker_cores``). We now split out cluster-specific things into a separate object, which can be used by the backend.
Now supports full start/connect/stop roundtrip.
Supports the full scale protocol. Things may change still as we implement more features, but things kinda work now.
Workers that die unexpectedly are automatically restarted by the gateway server.
Adds timeout support for cluster/worker startup, and heartbeat timeouts.
Helps catch failures between submission and heartbeats starting up.
Allows for long polling when querying for a cluster status.
Also a few general fixes
- Properly cleanup periodic tasks when shutting down the scheduler service - Bulk updates require updating the same columns for all rows in sqlalchemy, update code accordingly.
Improve logging of events.
Add an option to cleanup all active clusters when the application shutsdown. Default is True.
- Cleanup the backend base class definition - Delete a few unnecessary things
Also update setup.py [ci skip]
- Move everything into single process for ease of deployment - Use jsonrpc over tcp for configuration instead of a REST api - Support bulk operations in api
Switch communication direction so that the proxy polls the api server to maintain it's routing table.
|
Thanks @jcrist I agree on dropping kubernetes support in master while this work is in flight rather than a long running feature branch. When we tag up later today, let's talk about how my folks can help with kubernetes. |
- Fix bug in router implementation when serving on the root path - Integrate new proxy design with the gateway server - Drop `/gateway/` prefix, serving on root prefix by default now - General cleanups.
[ci skip]
Main gateway tests now pass. Also cleaned up and standardized some interface method names.
Previously any existing tasks in the reconcilation queue would be immediately cancelled. We now cancel all pending tasks, but tasks that are currently being worked on are allowed to complete. This makes shutdown cleaner.
Getting closer to having all old tests converted over. [ci skip]
Update backend tests for new functionality. Tests now work for - local - yarn - slurm - pbs backends.
A few tweaks to the test suite
- `asyncio.run` is python 3.7+ only, include implementation in `compat.py` for now. - `ssl_context` is removed in new aiohttp, use `ssl` instead - Drop kube and docs in travis yml for now [test-all]
|
Ok, I think this is good to merge now. I've been heads-down getting this refactor done (apologies for ignoring other things for a bit). Refactoring the world is never fun, and this PR had to touch almost everything. For users this should have no visible changes, but this does completely change the configuration/deployment story for admins. As such the docs will have to be updated. I've turned off doc builds on master for now, since we don't have the ability to publish multiple docs versions. I think when everything is up to date we can re-enable things. A few features that existed in the previous release have been dropped in this refactor. They can be added in subsequent PRs:
A summary of changes included in this PR: DependenciesWe switched from using AuthenticationAuthentication has been rewritten to remove the need for a backing database. External authenticator classes will likely need to be changed, but the changes should be minimal (function signature, switch to aiohttp handlers). Clusters now have a User groups have also been added to the authentication model, but we're not doing anything interesting with them yet. This could be used to provide view/modify access, shared clusters, etc... BackendsPreviously we abstracted on a We now extract both of these out into a new We also provide an implementation of the base Support for user-configurable cluster options has also been expanded to support per-user or per-group options. Database BackendThe current non-kubernetes backends all subclass from
ProxyA dask-gateway instance no longer has a For non-kubernetes backends, the
|
|
I don't expect anyone to be able to review a +6,608/−8,575 line PR, so I'm going to merge this for now. The master branch will be less featureful than the previous release until we re-add these features, but I'd rather work on the master branch than a feature branch. |
Fixes #186.
This rearranges our backend abstraction (as well as some of the control flow) to make it easier to make the kubernetes backend (and other backends) more native/performant.
[ ] User limits[ ] Kubernetes backend[ ] DocsFor the most part this is fully-functional already, and should just work. The only breaking changes here are on the administrator side - the user api remains the same.
I might drop the kubernetes support on master for a bit before we work things back up. The kubernetes backend will be split between the server and the a controller, I don't really want to rewrite the kubernetes backend to use the db only to remove it in a subsequent PR. I'm hesitant to continue this work on a feature branch since I don't want to deal with huge merge conflicts down the line, and this PR is already big enough. Either way tests and docs will be updated accordingly before merging.
I'll write up a design overview of what this PR brings tomorrow, it's late here.