Skip to content

Implement Databricks OIDC as Token Source #1204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
0b11b82
WIP
hectorcast-db Mar 3, 2025
ba4d103
Merge branch 'main' into databricks-auth
hectorcast-db Mar 17, 2025
200f16e
Test
hectorcast-db Mar 18, 2025
452e56f
CLI test
hectorcast-db Mar 19, 2025
780ddc7
Other stuff
hectorcast-db Mar 20, 2025
9494d40
refresh support
hectorcast-db Mar 20, 2025
e94811a
Test
hectorcast-db Mar 20, 2025
bac4c39
PR Comments
hectorcast-db Mar 24, 2025
bbc8f0c
Merge branch 'main' into databricks-auth
hectorcast-db Mar 25, 2025
83619c2
Default values
hectorcast-db Mar 25, 2025
66d8057
Merge branch 'main' into databricks-auth2
hectorcast-db Apr 4, 2025
9b7f636
WIP
hectorcast-db Apr 4, 2025
878c349
WIP
hectorcast-db Apr 4, 2025
8738b46
Remove conf + test
hectorcast-db Apr 7, 2025
506a146
More tests
hectorcast-db Apr 7, 2025
b8d28dd
almost
hectorcast-db Apr 7, 2025
b2c2ce4
last
hectorcast-db Apr 7, 2025
0bf2d2f
fixes
hectorcast-db Apr 7, 2025
3d39805
Rename files
hectorcast-db Apr 7, 2025
35df156
Enabe
hectorcast-db Apr 7, 2025
c5652f3
Remove config from tests
hectorcast-db Apr 8, 2025
635752b
comments
hectorcast-db Apr 8, 2025
78db765
PR comments
hectorcast-db Apr 8, 2025
47f7a2d
Undo endpoint supply
hectorcast-db Apr 8, 2025
10e5615
Cleanup
hectorcast-db Apr 8, 2025
d5cb4cc
Merge branch 'main' into databricks-auth2
hectorcast-db Apr 9, 2025
2431d67
PR comments
hectorcast-db Apr 9, 2025
a6e1769
Name
hectorcast-db Apr 9, 2025
72d715e
Revert mistake
hectorcast-db Apr 9, 2025
7f32a67
Ensure credentials order
hectorcast-db Apr 9, 2025
b09f4e8
More comments
hectorcast-db Apr 9, 2025
3e040c9
Rename
hectorcast-db Apr 9, 2025
4f59d7d
Remove duplicate code
hectorcast-db Apr 10, 2025
fa9fd25
Fix comment
hectorcast-db Apr 10, 2025
a7c2ab0
Merge branch 'main' into databricks-auth2
hectorcast-db Apr 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions NEXT_CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
## Release v0.64.0

### New Features and Improvements
* Introduce support for Databricks Workload Identity Federation in GitHub workflows ([1177](https://github.com/databricks/databricks-sdk-go/pull/1177)).
See README.md for instructions.
* [Breaking] Users running their worklows in GitHub Actions, which use Cloud native authentication and also have a `DATABRICKS_CLIENT_ID` and `DATABRICKS_HOST`
environment variables set may see their authentication start failing due to the order in which the SDK tries different authentication methods.
In such case, the `DATABRICKS_AUTH_TYPE` environment variable must be set to match the previously used authentication method.
* Enabled asynchronous token refreshes by default ([#1208](https://github.com/databricks/databricks-sdk-go/pull/1208)).

### Bug Fixes
Expand Down
47 changes: 31 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,35 @@ The Databricks SDK for Go includes functionality to accelerate development with

## Contents

- [Getting started](#getting-started)
- [Authentication](#authentication)
- [Code examples](#code-examples)
- [Long running operations](#long-running-operations)
- [Paginated responses](#paginated-responses)
- [GetByName utility methods](#getbyname-utility-methods)
- [Node type and Databricks Runtime selectors](#node-type-and-databricks-runtime-selectors)
- [Integration with `io` interfaces for DBFS](#integration-with-io-interfaces-for-dbfs)
- [User Agent Request Attribution](#user-agent-request-attribution)
- [Error Handling](#error-handling)
- [Logging](#logging)
- [Databricks SDK for Go](#databricks-sdk-for-go)
- [Contents](#contents)
- [Getting started](#getting-started)
- [Authentication](#authentication)
- [In this section](#in-this-section)
- [Default authentication flow](#default-authentication-flow)
- [Databricks native authentication](#databricks-native-authentication)
- [Azure native authentication](#azure-native-authentication)
- [Google Cloud Platform native authentication](#google-cloud-platform-native-authentication)
- [Overriding `.databrickscfg`](#overriding-databrickscfg)
- [Additional authentication configuration options](#additional-authentication-configuration-options)
- [Custom credentials provider](#custom-credentials-provider)
- [Code examples](#code-examples)
- [Long-running operations](#long-running-operations)
- [In this section](#in-this-section-1)
- [Command execution on clusters](#command-execution-on-clusters)
- [Cluster library management](#cluster-library-management)
- [Advanced usage](#advanced-usage)
- [Paginated responses](#paginated-responses)
- [`GetByName` utility methods](#getbyname-utility-methods)
- [Node type and Databricks Runtime selectors](#node-type-and-databricks-runtime-selectors)
- [Integration with `io` interfaces for DBFS](#integration-with-io-interfaces-for-dbfs)
- [Reading into and writing from buffers](#reading-into-and-writing-from-buffers)
- [`pflag.Value` for enums](#pflagvalue-for-enums)
- [User Agent Request Attribution](#user-agent-request-attribution)
- [Error handling](#error-handling)
- [Logging](#logging)
- [Testing](#testing)
- [Interface stability](#interface-stability)
- [Interface stability](#interface-stability)

## Getting started

Expand Down Expand Up @@ -158,18 +174,17 @@ Depending on the Databricks authentication method, the SDK uses the following in

### Databricks native authentication

By default, the Databricks SDK for Go initially tries Databricks token authentication (`AuthType: "pat"` in `*databricks.Config`). If the SDK is unsuccessful, it then tries Databricks basic (username/password) authentication (`AuthType: "basic"` in `*databricks.Config`).
By default, the Databricks SDK for Go initially tries Databricks token authentication (`AuthType: "pat"` in `*databricks.Config`). If the SDK is unsuccessful, it then tries Workload Identity Federation (WIF) based authentication(`AuthType: "github-oidc"` in `*databricks.Config`). Currently, only GitHub provided JWT Tokens is supported.

- For Databricks token authentication, you must provide `Host` and `Token`; or their environment variable or `.databrickscfg` file field equivalents.
- For Databricks basic authentication, you must provide `Host`, `Username`, and `Password` _(for AWS workspace-level operations)_; or `Host`, `AccountID`, `Username`, and `Password` _(for AWS, Azure, or GCP account-level operations)_; or their environment variable or `.databrickscfg` file field equivalents.
- For Databricks OIDC authentication, you must provide the `Host`, `ClientId` and `TokenAudience` _(optional)_ either directly, through the corresponding environment variables, or in your `.databrickscfg` configuration file. More information can be found in [Databricks Documentation](https://docs.databricks.com/aws/en/dev-tools/auth/oauth-federation#workload-identity-federation)

| `*databricks.Config` argument | Description | Environment variable / `.databrickscfg` file field |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------- |
| `Host` | _(String)_ The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | `DATABRICKS_HOST` / `host` |
| `AccountID` | _(String)_ The Databricks account ID for the Databricks accounts endpoint. Only has effect when `Host` is either `https://accounts.cloud.databricks.com/` _(AWS)_, `https://accounts.azuredatabricks.net/` _(Azure)_, or `https://accounts.gcp.databricks.com/` _(GCP)_. | `DATABRICKS_ACCOUNT_ID` / `account_id` |
| `Token` | _(String)_ The Databricks personal access token (PAT) _(AWS, Azure, and GCP)_ or Azure Active Directory (Azure AD) token _(Azure)_. | `DATABRICKS_TOKEN` / `token` |
| `Username` | _(String)_ The Databricks username part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_USERNAME` / `username` |
| `Password` | _(String)_ The Databricks password part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_PASSWORD` / `password` |
| `TokenAudience` | _(String)_ When using Workload Identity Federation, the audience to specify when fetching an ID token from the ID token supplier. | `DATABRICKS_TOKEN_AUDIENCE` / `token_audience` |

For example, to use Databricks token authentication:

Expand Down
38 changes: 8 additions & 30 deletions config/auth_azure_github_oidc.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ import (

"github.com/databricks/databricks-sdk-go/config/credentials"
"github.com/databricks/databricks-sdk-go/httpclient"
"github.com/databricks/databricks-sdk-go/logger"
"golang.org/x/oauth2"
)

Expand All @@ -24,54 +23,33 @@ func (c AzureGithubOIDCCredentials) Name() string {
// Configure implements [CredentialsStrategy.Configure].
func (c AzureGithubOIDCCredentials) Configure(ctx context.Context, cfg *Config) (credentials.CredentialsProvider, error) {
// Sanity check that the config is configured for Azure Databricks.
if !cfg.IsAzure() || cfg.AzureClientID == "" || cfg.Host == "" || cfg.AzureTenantID == "" {
if !cfg.IsAzure() || cfg.AzureClientID == "" || cfg.Host == "" || cfg.AzureTenantID == "" || cfg.ActionsIDTokenRequestURL == "" || cfg.ActionsIDTokenRequestToken == "" {
return nil, nil
}
supplier := githubIDTokenSource{actionsIDTokenRequestURL: cfg.ActionsIDTokenRequestURL,
actionsIDTokenRequestToken: cfg.ActionsIDTokenRequestToken,
refreshClient: cfg.refreshClient,
}

idToken, err := requestIDToken(ctx, cfg)
idToken, err := supplier.IDToken(ctx, "api://AzureADTokenExchange")
if err != nil {
return nil, err
}
if idToken == "" {
if idToken.Value == "" {
return nil, nil
}

ts := &azureOIDCTokenSource{
aadEndpoint: fmt.Sprintf("%s%s/oauth2/token", cfg.Environment().AzureActiveDirectoryEndpoint(), cfg.AzureTenantID),
clientID: cfg.AzureClientID,
applicationID: cfg.Environment().AzureApplicationID,
idToken: idToken,
idToken: idToken.Value,
httpClient: cfg.refreshClient,
}

return credentials.NewOAuthCredentialsProvider(refreshableVisitor(ts), ts.Token), nil
}

// requestIDToken requests an ID token from the Github Action.
func requestIDToken(ctx context.Context, cfg *Config) (string, error) {
if cfg.ActionsIDTokenRequestURL == "" {
logger.Debugf(ctx, "Missing cfg.ActionsIDTokenRequestURL, likely not calling from a Github action")
return "", nil
}
if cfg.ActionsIDTokenRequestToken == "" {
logger.Debugf(ctx, "Missing cfg.ActionsIDTokenRequestToken, likely not calling from a Github action")
return "", nil
}

resp := struct { // anonymous struct to parse the response
Value string `json:"value"`
}{}
err := cfg.refreshClient.Do(ctx, "GET", fmt.Sprintf("%s&audience=api://AzureADTokenExchange", cfg.ActionsIDTokenRequestURL),
httpclient.WithRequestHeader("Authorization", fmt.Sprintf("Bearer %s", cfg.ActionsIDTokenRequestToken)),
httpclient.WithResponseUnmarshal(&resp),
)
if err != nil {
return "", fmt.Errorf("failed to request ID token from %s: %w", cfg.ActionsIDTokenRequestURL, err)
}

return resp.Value, nil
}

// azureOIDCTokenSource implements [oauth2.TokenSource] to obtain Azure auth
// tokens from an ID token.
type azureOIDCTokenSource struct {
Expand Down
91 changes: 91 additions & 0 deletions config/auth_databricks_oidc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
package config

import (
"context"
"errors"
"net/url"

"github.com/databricks/databricks-sdk-go/config/experimental/auth"
"github.com/databricks/databricks-sdk-go/credentials/u2m"
"github.com/databricks/databricks-sdk-go/logger"
"golang.org/x/oauth2"
"golang.org/x/oauth2/clientcredentials"
)

// Creates a new Databricks OIDC TokenSource.
func NewDatabricksOIDCTokenSource(cfg DatabricksOIDCTokenSourceConfig) auth.TokenSource {
return &databricksOIDCTokenSource{
cfg: cfg,
}
}

// Config for Databricks OIDC TokenSource.
type DatabricksOIDCTokenSourceConfig struct {
// ClientID is the client ID of the Databricks OIDC application. For
// Databricks Service Principal, this is the Application ID of the Service Principal.
ClientID string
// [Optional] AccountID is the account ID of the Databricks Account.
// This is only used for Account level tokens.
AccountID string
// Host is the host of the Databricks account or workspace.
Host string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this field is never used. Is there a reason why we have it as a field? If not, would it be possible to remove it?

// TokenEndpointProvider returns the token endpoint for the Databricks OIDC application.
TokenEndpointProvider func(ctx context.Context) (*u2m.OAuthAuthorizationServer, error)
// Audience is the audience of the Databricks OIDC application.
// This is only used for Workspace level tokens.
Audience string
// IdTokenSource returns the IDToken to be used for the token exchange.
IdTokenSource IDTokenSource
}

// databricksOIDCTokenSource is a auth.TokenSource which exchanges a token using
// Workload Identity Federation.
type databricksOIDCTokenSource struct {
cfg DatabricksOIDCTokenSourceConfig
}

// Token implements [TokenSource.Token]
func (w *databricksOIDCTokenSource) Token(ctx context.Context) (*oauth2.Token, error) {
if w.cfg.ClientID == "" {
logger.Debugf(ctx, "Missing ClientID")
return nil, errors.New("missing ClientID")
}
if w.cfg.Host == "" {
logger.Debugf(ctx, "Missing Host")
return nil, errors.New("missing Host")
}
endpoints, err := w.cfg.TokenEndpointProvider(ctx)
if err != nil {
return nil, err
}
audience := w.determineAudience(endpoints)
idToken, err := w.cfg.IdTokenSource.IDToken(ctx, audience)
if err != nil {
return nil, err
}

c := &clientcredentials.Config{
ClientID: w.cfg.ClientID,
AuthStyle: oauth2.AuthStyleInParams,
TokenURL: endpoints.TokenEndpoint,
Scopes: []string{"all-apis"},
EndpointParams: url.Values{
"subject_token_type": {"urn:ietf:params:oauth:token-type:jwt"},
"subject_token": {idToken.Value},
"grant_type": {"urn:ietf:params:oauth:grant-type:token-exchange"},
},
}
return c.Token(ctx)
}

func (w *databricksOIDCTokenSource) determineAudience(endpoints *u2m.OAuthAuthorizationServer) string {
if w.cfg.Audience != "" {
return w.cfg.Audience
}
// For Databricks Accounts, the account id is the default audience.
if w.cfg.AccountID != "" {
return w.cfg.AccountID
}
// For Databricks Workspaces, the auth endpoint is the default audience.
return endpoints.TokenEndpoint
}
Loading
Loading