-
Couldn't load subscription status.
- Fork 353
Feat: implement runtime CLI auth #3209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 9a12719 | Commit Preview URL Branch Preview URL |
Oct 26 2025, 01:05 PM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool PR, thank you! Just a few small comments inline and:
- we have a nice make command
update-cli-docswhich will update the cli docs with your new command automatically! If you don't run it, the linter will fail, because it checks if the docs are up to date - I would not put the code for generating the clients in this repo I think. If it is convenient for you during development, please keep it, but I think before release we should only have the clients in here, or is there a good reason to keep it here?
| self.parser = parser | ||
|
|
||
| subparsers = parser.add_subparsers( | ||
| title="Available subcommands", dest="profile_command", required=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be dest="runtime_command". I think this will show up in the help and also the docs when you have regenerated them.
| WorkspaceIdMismatch, | ||
| LocalWorkspaceIdNotSet, | ||
| ) | ||
| from dlt._workspace.runtime_clients import AUTH_BASE_URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these URLs should be part of the runtime config sections in config.toml. The associated class is RuntimeConfiguration. This way you can point the runtime commands to various endpoints for development or staging environments. The defaults can be there localhost urls for now.
| try: | ||
| auth_service.authorise() | ||
| except LocalWorkspaceIdNotSet: | ||
| fmt.echo("No workspace id found in local config, using default remote workspace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the future we want may want to ask the user here, there may be cases where the user accidentally connects a different workspace. I would also maybe not say anything about "default workspace". From the perspective of the user, there is just one workspace for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for different workspace (another id already set locally) there's another exception with different handling. But makes sense, I'll be asking for confirmation and remove "default" for the time being
| class RuntimeCommand(SupportsCliCommand): | ||
| command = "runtime" | ||
| help_string = "Connect to Runtime and manage your remote Workspaces" | ||
| description = """""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would fill this out, this will also go into the generated docs which you can find at https://dlthub.com/docs/reference/command-line-interface
| fmt.echo("Local workspace id overwritten with remote workspace id") | ||
| else: | ||
| raise RuntimeError("Unable to synchronise remote and local workspaces") | ||
| fmt.echo("Authorised to workspace %s" % fmt.bold(auth_service.workspace_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small not important nit-pick. We use us english internally and externally, where authorise is spelt authorize. Ultimately we should probably extract all of those cli strings into a strings file at some point and can get AI to proof-read. That is for later though :)
| "Workspace id in local config (%s) is not the same as remote workspace id (%s)" | ||
| % (e.local_workspace_id, e.remote_workspace_id) | ||
| ) | ||
| should_overwrite = fmt.prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use fmt.confirm here. We have nice test helpers as well as a non-interactive mode that only works with the confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was looking for something like that!
dlt/_workspace/auth.py
Outdated
| jwt_token: str | ||
|
|
||
|
|
||
| class AuthService: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Generally speaking we use a more functional programming style for all the cli helpers, you can see this very well in the deploy commands. I'm not sure if this ultimately is good or bad. I'd say you can choose to follow that pattern or choose to create these classes for all the runtime interaction stuff. Just keep in mind that these classes are very short lived and do not retain any state between cli commands, so they are not really that useful. If you want to introduce some kind of dependency injection between these services for better testability, that would be a reason to build it this way. Testing pure functions that only depend on the inputs is a quite nice pattern too though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've noticed - I've implemented the entrypoints in functional style. Chose a class for this one to specifically encapsulate all the auth internals and auth state. It will be very useful in next commands (dlt runtime deploy and dlt runtime run) - there will be many calls to auth-protected API, so I think keeping this state even for one command is worth it. I could switch to functional style too, it would be a bit more verbose as to my taste, but I've also noticed some internals such as Provider and RunContext are implemented as classes, so would rather stick to class here if you don't mind?
|
Thanks @sh-rp ! I've fixed all problems you pointed out. Also made a little move, Regarding your PR-level comments,
Otherwise, waiting for @rudolfix 's CLI test helpers to jump on the tests and finalize this PR |
…08-runtime-cli-auth # Conflicts: # Makefile
4daed6a to
d09e84c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall ask: please place all runtime related code in
dlt/_workspace/runtime module. that includes cli and exceptions. will be easier to handle if we need to split this into module
|
|
||
| ## Generates python clients for runtime api and auth services | ||
| # Before running, adjust the URL or run `make up` in runtime first | ||
| generate-python-clients: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please generate client from openapi specs that we keep in this repo and can always use to recreate old clients.
- we have tools module where we can keep open api specs for different services ie
runtime_client/auth.spec.yamlor smth - I think it makes sense to keep list of major versions there. obviously we have just one now
- does not make sense to distribute specs with OSS code (as it is now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I remember, we’ve discussed offline that we store specs and generated code in OSS _workspace for the time being to avoid generating a separate package, which is what I’m doing here. happy to change the path of saving the spec, or refactor otherwise - let’s discuss?
dlt/_workspace/configuration.py
Outdated
|
|
||
| # TODO: connect workspace to runtime here | ||
| # TODO: optionally define scripts and other runtime settings | ||
| __section__: ClassVar[str] = "workspace_runtime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhmmm maybe let's keep base class section. I think you can remove this line...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! just for me to understand better, what’s your heuristic, how many config fields deserve their own section, or how do you decide it otherwise?
and btw @rudolfix after switching I experienced that PluggableRunContext.reload_providers didn’t fully do the job. I’ve patched the implementation a bit in a separate commit 9a12719 , please take a look if it’s the right direction - it’s the only thing that helped
|
|
||
| @dataclass | ||
| class AuthInfo: | ||
| user_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it random user id that our runtime generates? or user id from SSO ie. github? should be the first one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is indeed our user id generated by the backend
| @dataclass | ||
| class AuthInfo: | ||
| user_id: str | ||
| email: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to send PII data to the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it’s coming from the server in jwt payload, as long as we store the token on the fs, I thought it was ok to use it in memory. I’m using it here to show the user the email they logged in with (printed to stdout with asterisks). happy to get rid of it here, or remove from jwt payload on the server side too, let me know what to do better
|
|
||
| class RuntimeAuthService: | ||
| """ | ||
| Implements login, logout and auth check internals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO authentication should be split from connecting the local workspace to remote one. We should have two public methods here
- authenticate: obtains valid jwt
- connect: connects local workspace to remote one which ends with a valid workspace_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already split, renamed authorize to connect as you requested in another comment. for the rationale why more public methods are needed from my POV, please see main comment - happy to discuss
dlt/_workspace/runtime.py
Outdated
|
|
||
|
|
||
| def get_auth_client() -> AuthClient: | ||
| config = resolve_configuration(WorkspaceRuntimeConfiguration()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need workspace run context to get configuration
|
|
||
| try: | ||
| auth_info = AuthInfo( | ||
| jwt_token=token.decode("utf-8"), email=payload["email"], user_id=payload["user_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why sub is not used for user_id? what is the identity the jwt is issued against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sub is also set with the same id, not sure, but it would be easy to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d use user_id in sub and get rid of email, happy to refactor it on both client and server side
dlt/_workspace/runtime.py
Outdated
| self._delete_token() | ||
| self._remote_workspace_id = None | ||
|
|
||
| def authorize(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my POV this is workspace connect operation, We do not do any workspace authorization here. It is probably OK for now. authorization should be a part of connect process in the future here ie. view or connect permission will be requested on worskpace with given id. now you do not even send existing workspace id if it is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing, changed!
dlt/_workspace/runtime.py
Outdated
| from typing import Optional | ||
|
|
||
| from dataclasses import dataclass | ||
| import jwt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not use jwt - but other library (jose-jwt) please take a look at dlthub repo. reason: not to force users to use 100mb size of cryptography lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, great catch!
dlt/_workspace/runtime.py
Outdated
|
|
||
| from dataclasses import dataclass | ||
| import jwt | ||
| from git import Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should come from typing
|
Thanks for the detailed review @rudolfix ! To answer the general comment, do you mean single file module, or actually a package? Happy to introduce a package, but wouldn’t put everything into one module, please see the rationale below
|
Closes #3208
runtimecommand with 2 subcommands:loginandlogoutAuthServicethat implements auth-related business logic. This service is currently used in CLI representation-level handler functions that implement CLI interaction with the user. In the future, the service orauthorise()handler function can be reused in otherdlt runtimesubcommandstomlkitto ensure values are read from/written to the specific filesmakecommand. Generated SDK code is ignored during linting