Skip to content

Conversation

@ivasio
Copy link
Contributor

@ivasio ivasio commented Oct 15, 2025

Closes #3208

  • Added the runtime command with 2 subcommands: login and logout
  • Implemented AuthService that implements auth-related business logic. This service is currently used in CLI representation-level handler functions that implement CLI interaction with the user. In the future, the service or authorise() handler function can be reused in other dlt runtime subcommands
  • Secrets/configs reading and writing is implemented by constructing providers explicitly or even directly using tomlkit to ensure values are read from/written to the specific files
  • Runtime API SDK generation is added together with make command. Generated SDK code is ignored during linting

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 15, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs 9a12719 Commit Preview URL

Branch Preview URL
Oct 26 2025, 01:05 PM

Copy link
Collaborator

@sh-rp sh-rp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool PR, thank you! Just a few small comments inline and:

  • we have a nice make command update-cli-docswhich will update the cli docs with your new command automatically! If you don't run it, the linter will fail, because it checks if the docs are up to date
  • I would not put the code for generating the clients in this repo I think. If it is convenient for you during development, please keep it, but I think before release we should only have the clients in here, or is there a good reason to keep it here?

self.parser = parser

subparsers = parser.add_subparsers(
title="Available subcommands", dest="profile_command", required=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be dest="runtime_command". I think this will show up in the help and also the docs when you have regenerated them.

WorkspaceIdMismatch,
LocalWorkspaceIdNotSet,
)
from dlt._workspace.runtime_clients import AUTH_BASE_URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these URLs should be part of the runtime config sections in config.toml. The associated class is RuntimeConfiguration. This way you can point the runtime commands to various endpoints for development or staging environments. The defaults can be there localhost urls for now.

try:
auth_service.authorise()
except LocalWorkspaceIdNotSet:
fmt.echo("No workspace id found in local config, using default remote workspace")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future we want may want to ask the user here, there may be cases where the user accidentally connects a different workspace. I would also maybe not say anything about "default workspace". From the perspective of the user, there is just one workspace for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for different workspace (another id already set locally) there's another exception with different handling. But makes sense, I'll be asking for confirmation and remove "default" for the time being

class RuntimeCommand(SupportsCliCommand):
command = "runtime"
help_string = "Connect to Runtime and manage your remote Workspaces"
description = """"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would fill this out, this will also go into the generated docs which you can find at https://dlthub.com/docs/reference/command-line-interface

fmt.echo("Local workspace id overwritten with remote workspace id")
else:
raise RuntimeError("Unable to synchronise remote and local workspaces")
fmt.echo("Authorised to workspace %s" % fmt.bold(auth_service.workspace_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small not important nit-pick. We use us english internally and externally, where authorise is spelt authorize. Ultimately we should probably extract all of those cli strings into a strings file at some point and can get AI to proof-read. That is for later though :)

"Workspace id in local config (%s) is not the same as remote workspace id (%s)"
% (e.local_workspace_id, e.remote_workspace_id)
)
should_overwrite = fmt.prompt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use fmt.confirm here. We have nice test helpers as well as a non-interactive mode that only works with the confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was looking for something like that!

jwt_token: str


class AuthService:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Generally speaking we use a more functional programming style for all the cli helpers, you can see this very well in the deploy commands. I'm not sure if this ultimately is good or bad. I'd say you can choose to follow that pattern or choose to create these classes for all the runtime interaction stuff. Just keep in mind that these classes are very short lived and do not retain any state between cli commands, so they are not really that useful. If you want to introduce some kind of dependency injection between these services for better testability, that would be a reason to build it this way. Testing pure functions that only depend on the inputs is a quite nice pattern too though.

Copy link
Contributor Author

@ivasio ivasio Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've noticed - I've implemented the entrypoints in functional style. Chose a class for this one to specifically encapsulate all the auth internals and auth state. It will be very useful in next commands (dlt runtime deploy and dlt runtime run) - there will be many calls to auth-protected API, so I think keeping this state even for one command is worth it. I could switch to functional style too, it would be a bit more verbose as to my taste, but I've also noticed some internals such as Provider and RunContext are implemented as classes, so would rather stick to class here if you don't mind?

@ivasio
Copy link
Contributor Author

ivasio commented Oct 16, 2025

Thanks @sh-rp ! I've fixed all problems you pointed out. Also made a little move, auth.py -> runtime.py, because the client helpers that should be now there if I consider your comments, don't fit anywhere else. And AuthService is a ProfileAuthService if I'm specific, so sorry for the move after the first review, but I think it makes more sense now

Regarding your PR-level comments,

  • this code is under the .workspace feature flag, so it doesn't get rendered to Markdown docs by update-cli-docs. Not exactly sure why tests are failing, I'll rerun, but seems like general flakiness, cause I haven't added any new ones yet
  • there's only one make command and one helper script, I thought it was more useful to have code generation commands in the same repo where the generated code will live, as we agreed on the meeting. Easier to have the openapi URLs "external" to this codebase, rather then the output directory, or do you think otherwise?

Otherwise, waiting for @rudolfix 's CLI test helpers to jump on the tests and finalize this PR

@ivasio ivasio force-pushed the feat/3208-runtime-cli-auth branch from 4daed6a to d09e84c Compare October 22, 2025 12:50
@ivasio ivasio requested a review from sh-rp October 23, 2025 10:58
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall ask: please place all runtime related code in
dlt/_workspace/runtime module. that includes cli and exceptions. will be easier to handle if we need to split this into module


## Generates python clients for runtime api and auth services
# Before running, adjust the URL or run `make up` in runtime first
generate-python-clients:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please generate client from openapi specs that we keep in this repo and can always use to recreate old clients.

  • we have tools module where we can keep open api specs for different services ie runtime_client/auth.spec.yaml or smth
  • I think it makes sense to keep list of major versions there. obviously we have just one now
  • does not make sense to distribute specs with OSS code (as it is now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I remember, we’ve discussed offline that we store specs and generated code in OSS _workspace for the time being to avoid generating a separate package, which is what I’m doing here. happy to change the path of saving the spec, or refactor otherwise - let’s discuss?


# TODO: connect workspace to runtime here
# TODO: optionally define scripts and other runtime settings
__section__: ClassVar[str] = "workspace_runtime"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhmmm maybe let's keep base class section. I think you can remove this line...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! just for me to understand better, what’s your heuristic, how many config fields deserve their own section, or how do you decide it otherwise?
and btw @rudolfix after switching I experienced that PluggableRunContext.reload_providers didn’t fully do the job. I’ve patched the implementation a bit in a separate commit 9a12719 , please take a look if it’s the right direction - it’s the only thing that helped


@dataclass
class AuthInfo:
user_id: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it random user id that our runtime generates? or user id from SSO ie. github? should be the first one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is indeed our user id generated by the backend

@dataclass
class AuthInfo:
user_id: str
email: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to send PII data to the client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’s coming from the server in jwt payload, as long as we store the token on the fs, I thought it was ok to use it in memory. I’m using it here to show the user the email they logged in with (printed to stdout with asterisks). happy to get rid of it here, or remove from jwt payload on the server side too, let me know what to do better


class RuntimeAuthService:
"""
Implements login, logout and auth check internals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO authentication should be split from connecting the local workspace to remote one. We should have two public methods here

  • authenticate: obtains valid jwt
  • connect: connects local workspace to remote one which ends with a valid workspace_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already split, renamed authorize to connect as you requested in another comment. for the rationale why more public methods are needed from my POV, please see main comment - happy to discuss



def get_auth_client() -> AuthClient:
config = resolve_configuration(WorkspaceRuntimeConfiguration())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll need workspace run context to get configuration


try:
auth_info = AuthInfo(
jwt_token=token.decode("utf-8"), email=payload["email"], user_id=payload["user_id"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sub is not used for user_id? what is the identity the jwt is issued against?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sub is also set with the same id, not sure, but it would be easy to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d use user_id in sub and get rid of email, happy to refactor it on both client and server side

self._delete_token()
self._remote_workspace_id = None

def authorize(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my POV this is workspace connect operation, We do not do any workspace authorization here. It is probably OK for now. authorization should be a part of connect process in the future here ie. view or connect permission will be requested on worskpace with given id. now you do not even send existing workspace id if it is there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, changed!

from typing import Optional

from dataclasses import dataclass
import jwt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not use jwt - but other library (jose-jwt) please take a look at dlthub repo. reason: not to force users to use 100mb size of cryptography lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, great catch!


from dataclasses import dataclass
import jwt
from git import Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should come from typing

@ivasio
Copy link
Contributor Author

ivasio commented Oct 26, 2025

Thanks for the detailed review @rudolfix ! To answer the general comment, do you mean single file module, or actually a package? Happy to introduce a package, but wouldn’t put everything into one module, please see the rationale below

  • I tried to follow the current approach overall used in CLI and pointed out by Dave, where most logic lives in handler functions. It is used in the further work, please see the WIP branch
  • auth functionality is a bit separate though because it’s going to be used in a lot of other methods. so I separated only auth business logic into a separate class (business logic layer) to be able to use it directly in CLI handler functions (adapter layer), so ports/adapters-like approach. business logic layer methods that are called from adapter layer are public, all the other ones are protected
  • because auth process is interactive, I e.g. separated authorize into separate methods, and yes, implementation detail of overwriting the workspace id is exposed to adapter layer
  • This separation seems to make sense to me, please let’s discuss if you’d rather me inline this class into the handlers. It’s easy to do now as long as we reuse the interactive login as in the WIP branch, but if we are to invoke auth business logic separately going forward, the inlining would prohibit that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement runtime CLI auth

3 participants