Skip to content

feat(c/driver_manager): add connection profile interface#3876

Open
zeroshade wants to merge 37 commits intoapache:mainfrom
zeroshade:profile-settings-configuration
Open

feat(c/driver_manager): add connection profile interface#3876
zeroshade wants to merge 37 commits intoapache:mainfrom
zeroshade:profile-settings-configuration

Conversation

@zeroshade
Copy link
Member

An attempt to implement the ideas as suggested in #3765 to provide the ability to specify and load connection profiles to define options for use by the driver manager.

This creates an AdbcConnectionProfile struct and defines an AdbcConnectionProfileProvider function pointer typedef to allow for customized management of profiles. This also implements a default file-based profile provider as described in #3765 (comment) which will be used if no custom provider has been set.

This allows easy expansion in the future for non-file-based connection profile providers while still implementing the easier case of using file-based profiles, including hierarchical specification for now. See the documentation comments added to adbc_driver_manager.h for the full description of the semantics and explanation.

@zeroshade
Copy link
Member Author

CC @davidhcoe

if there's anyone I missed please feel free to tag others that might be interested in looking this over and reviewing it. Once everyone is onboard with the design and ideas I'll implement the handling of these profiles for the other language bindings.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

All just questions from me to take or leave 🙂

Comment on lines +321 to +327
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to wrap this in a top level [profile] item (like pyproject.toml's [project] or Cargo.toml's [workspace])?

Suggested change
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14
/// [adbc.profile]
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I was trying to avoid having to do that. Is there any benefit to doing so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just seemed like a common idiom in .toml configs, perhaps to allow for future expansion (e.g., if you ever want >1 profile to live in one .toml)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think after a lot of discussion between myself, @lidavidm and @ianmcook we came to the decision that we don't want more than 1 profile to live in one .toml file haha 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having multiple connection profiles defined in a single TOML file could potentially be nice, but it complicates the profile search procedure, and I worry that it could confuse users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my point was just that you have some future flexibility if you nest the configuration and if you put adbc and profile in there somewhere there's a better chance somebody who runs across this file without context will know what it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do that for driver TOML files though 😬

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is very easy to change if it ever comes up 🙂

Comment on lines 223 to 224
AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
const char** driver_name, struct AdbcError* error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
const char** driver_name, struct AdbcError* error);
AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
const char** driver_name, AdbcDriverInitFunc* init_func, struct AdbcError* error);

Is there any value to allowing a profile to optionally specify the init function directly? (R could maybe use this to find R package versions of drivers)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the code is set up doesn't allow for this. If there is an init func set then the profile handling is currently entirely skipped and it's up to the init_func to handle everything. Given the way we envision profiles working, I don't think that it makes sense for the profile to set or specify the init function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'd just set args->init_func instead of args->driver (understood if you'd prefer not to)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer not, but I'm open to it if others think it's worthwhile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you do want this while you're here and I think it is fairly easy to do. Providing an init function is a more predictable/secure way to precisely specify which driver will actually get loaded. There are now quite a lot of ways that a driver can be resolved based on name that change based on working directory/environment and if I were embedding ADBC in an application I would personally want the ability to be absolutely sure about what driver/version was loaded. (Alternatively, as an application I could invent my own way to manage Profiles and opt out of the driver manager).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, I just didn't based on the way I implemented it. If both of you think it's a good way to go, I don't have a good argument against it so I can add it to the profile object. Sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have other people chime in as well ^_^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, please poke people! 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the way this is set up with an embedded callback makes it very difficult to change later, so if if it's easy to do and there's a plausible scenario where it might be needed I think it is worth it. There are a few types of drivers that can never be loaded by name/entrypoint and so I think it is plausible this could come up (e.g., giant statically linked piles of Rust crates, some R packages).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with your suggestion here to allow the profile to provide an init_func

@davidhcoe
Copy link
Contributor

@CurtHagenlocher will be interested in this as well.

driver = "adbc_driver_snowflake"

[options]
adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Why not use option = { env_var = "NAME" }?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would prevent us from using environment variables to define substrings, which we will need for the drivers that expect their connection arguments to be in a URI.

uri = "scheme://host:port?user=user&pw=env_var(PASSWORD)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider using ${} instead of env_var()?

That is more concise and makes escaping literals easier (e.g. with \$ or $$).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ${} might be a little more familiar, unless you expect to eventually have other things besides env_var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the expectation was that we would eventually have more special functions in addition to env_var which is why this was the original suggestion. For eventual consistency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will end up with the lambda calculus in these configuration strings :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also still need to specify how to escape this (in the very unlikely instance that it's needed...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really trying to avoid inventing a way to escape this.... I can't even think of what a good syntax would be to escape this unless we change it to something like $env_var or provide some actual delimiter for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's an argument for changing the syntax, at least :)
Even if we don't specify how to escape it now, we may have to down the line...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I feel like most string substitution syntaxes put the delimiters on the outside ({foo(bar)}) which makes it more obvious (and possibly easier to parse?)

@zeroshade
Copy link
Member Author

@CurtHagenlocher @davidhcoe @lidavidm @ianmcook any further comments?

@zeroshade
Copy link
Member Author

I've updated the comments and the rst document based on everyone's comments. Please re-review

@davidhcoe
Copy link
Contributor

fyi @jadewang-db and @eric-wang-1990

driver = "adbc_driver_snowflake"

[options]
adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also still need to specify how to escape this (in the very unlikely instance that it's needed...)

@zeroshade zeroshade force-pushed the profile-settings-configuration branch from 23fed89 to 3cebf5a Compare January 26, 2026 20:46
Comment on lines +1382 to +1384
#ifdef _WIN32
const wchar_t* profiles_dir = L"Profiles";
#elif defined(__APPLE__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget whether the Windows configuration directory from InternalAdbcUserConfigDir() uses the "Roaming" or local version but it might be worth checking to make sure the Profiles are stored in the right one (i.e., should Profiles follow a user around when logging in to the same account on multiple computers, or should they be local to one computer?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we use LocalAppData I believe, I don't think it necessarily makes sense for the profiles to follow the user around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if anybody notices they can open an issue 🙂

Comment on lines 223 to 224
AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
const char** driver_name, struct AdbcError* error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you do want this while you're here and I think it is fairly easy to do. Providing an init function is a more predictable/secure way to precisely specify which driver will actually get loaded. There are now quite a lot of ways that a driver can be resolved based on name that change based on working directory/environment and if I were embedding ADBC in an application I would personally want the ability to be absolutely sure about what driver/version was loaded. (Alternatively, as an application I could invent my own way to manage Profiles and opt out of the driver manager).

Comment on lines +321 to +327
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my point was just that you have some future flexibility if you nest the configuration and if you put adbc and profile in there somewhere there's a better chance somebody who runs across this file without context will know what it is.

Comment on lines 456 to 462
#ifdef _WIN32
auto local_env_var = Utf8Decode(std::string(env_var_name));
DWORD required_size = GetEnvironmentVariableW(local_env_var.c_str(), NULL, 0);
if (required_size == 0) {
out = "";
return ADBC_STATUS_OK;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this is important (to avoid putting secrets in .toml files), but there is now quite a bit of environment variable state that is required to do certain things (like prepend the driver search path with a directory of preferred drivers if I remember the order correctly), which means the chance of an application concurrently setting and the driver manager fetching an environment variable is non-trivial in the presence of something like a connection pool. Not in this PR, but it may be worth documenting how to create connections in a thread-safe way and/or ensuring there is a thread safe alternative to accomplish some of these things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can agree with this, let's follow up in a separate PR for this though as you suggest.

Comment on lines 223 to 224
AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile,
const char** driver_name, struct AdbcError* error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're allowing the profile to control everything else, why not let it also control the init_func? I could imagine distributing a profile implementation that bundles or loads drivers itself and doesn't want to let the driver manager do the search. We already let profiles omit the driver/specify it implicitly, so even if the bundled implementation won't ever use init_func, we may as well allow the option for other implementations of profiles.

Conversely...if telling the user to just init_func is sufficient, then why doesn't that also apply to profiles themselves?

driver = "adbc_driver_snowflake"

[options]
adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's an argument for changing the syntax, at least :)
Even if we don't specify how to escape it now, we may have to down the line...

driver = "adbc_driver_snowflake"

[options]
adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I feel like most string substitution syntaxes put the delimiters on the outside ({foo(bar)}) which makes it more obvious (and possibly easier to parse?)

Comment on lines +321 to +327
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do that for driver TOML files though 😬

Comment on lines +1321 to +1323
std::string message =
"Profile version '" + std::to_string(version) +
"' is not supported by this driver manager. Profile: " + profile_path.string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally I think we should discuss moving to C++20 soon...then we can use std::format

return ParseDriverUriResult{d, str};
if (d == "profile" && str.size() > pos + 2) {
// found a profile URI "profile://"
return ParseDriverUriResult{"", std::nullopt, str.substr(pos + 3)};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily something we have to do in this PR, but I think this should become a std::variant (driver+URI or profile)

@@ -43,6 +43,7 @@ std::filesystem::path InternalAdbcUserConfigDir();
struct ParseDriverUriResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we're going to need to figure out how better to expose internal functions for testing, e.g. I feel like env var substitution should be something that we unit-test directly, and having to redeclare everything is not a great way to do tests. (Also possibly break up the file, since it's getting big, and have some way to assemble a single source file for distribution, possibly one that we check in for convenience.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though I don't think it's something we need to do in this PR.

auto diff = pos_match - pos_last_match;
auto start_match = end_of_last_match;
std::advance(start_match, diff);
if (pos_match > 0 && value[pos_match - 1] == '\\') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this syntax is a little funky. So just \{ isn't an escape, for instance, only {{...}}? I guess that's reasonable enough...it's not consistent with jinja2/etc. but also I don't know if we want to imply a full templating language either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to any suggestions. I explicitly don't want to imply a full templating language. Currently we won't match a single { only {{, as far as I know, most templating languages don't actually provide a way to escape the {{ you just need to use an expression that outputs the {{ like {{ print "{{" }} etc. I also can't think of a reason offhand why someone would really need to escape it. So I'm fine removing this escaping since the syntax is funky. Up to you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd be OK not supporting escapes for now, and if/when someone asks for it then we can work out the syntax

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just, I thought it was difficult for the env_var(FOO) to be escaped 🙂

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +321 to +327
/// version = 1
/// driver = "driver_name"
///
/// [options]
/// option1 = "value1"
/// option2 = 42
/// option3 = 3.14
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is very easy to change if it ever comes up 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants