feat(c/driver_manager): add connection profile interface#3876
feat(c/driver_manager): add connection profile interface#3876zeroshade wants to merge 37 commits intoapache:mainfrom
Conversation
|
CC @davidhcoe if there's anyone I missed please feel free to tag others that might be interested in looking this over and reviewing it. Once everyone is onboard with the design and ideas I'll implement the handling of these profiles for the other language bindings. |
paleolimbot
left a comment
There was a problem hiding this comment.
Cool!
All just questions from me to take or leave 🙂
| /// version = 1 | ||
| /// driver = "driver_name" | ||
| /// | ||
| /// [options] | ||
| /// option1 = "value1" | ||
| /// option2 = 42 | ||
| /// option3 = 3.14 |
There was a problem hiding this comment.
Do you want to wrap this in a top level [profile] item (like pyproject.toml's [project] or Cargo.toml's [workspace])?
| /// version = 1 | |
| /// driver = "driver_name" | |
| /// | |
| /// [options] | |
| /// option1 = "value1" | |
| /// option2 = 42 | |
| /// option3 = 3.14 | |
| /// [adbc.profile] | |
| /// version = 1 | |
| /// driver = "driver_name" | |
| /// | |
| /// [options] | |
| /// option1 = "value1" | |
| /// option2 = 42 | |
| /// option3 = 3.14 |
There was a problem hiding this comment.
Personally I was trying to avoid having to do that. Is there any benefit to doing so?
There was a problem hiding this comment.
Just seemed like a common idiom in .toml configs, perhaps to allow for future expansion (e.g., if you ever want >1 profile to live in one .toml)
There was a problem hiding this comment.
Having multiple connection profiles defined in a single TOML file could potentially be nice, but it complicates the profile search procedure, and I worry that it could confuse users.
There was a problem hiding this comment.
I think my point was just that you have some future flexibility if you nest the configuration and if you put adbc and profile in there somewhere there's a better chance somebody who runs across this file without context will know what it is.
There was a problem hiding this comment.
We don't do that for driver TOML files though 😬
There was a problem hiding this comment.
This one is very easy to change if it ever comes up 🙂
| AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile, | ||
| const char** driver_name, struct AdbcError* error); |
There was a problem hiding this comment.
| AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile, | |
| const char** driver_name, struct AdbcError* error); | |
| AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile, | |
| const char** driver_name, AdbcDriverInitFunc* init_func, struct AdbcError* error); |
Is there any value to allowing a profile to optionally specify the init function directly? (R could maybe use this to find R package versions of drivers)
There was a problem hiding this comment.
The way the code is set up doesn't allow for this. If there is an init func set then the profile handling is currently entirely skipped and it's up to the init_func to handle everything. Given the way we envision profiles working, I don't think that it makes sense for the profile to set or specify the init function.
There was a problem hiding this comment.
I think you'd just set args->init_func instead of args->driver (understood if you'd prefer not to)
There was a problem hiding this comment.
I personally prefer not, but I'm open to it if others think it's worthwhile.
There was a problem hiding this comment.
I think you do want this while you're here and I think it is fairly easy to do. Providing an init function is a more predictable/secure way to precisely specify which driver will actually get loaded. There are now quite a lot of ways that a driver can be resolved based on name that change based on working directory/environment and if I were embedding ADBC in an application I would personally want the ability to be absolutely sure about what driver/version was loaded. (Alternatively, as an application I could invent my own way to manage Profiles and opt out of the driver manager).
There was a problem hiding this comment.
We can, I just didn't based on the way I implemented it. If both of you think it's a good way to go, I don't have a good argument against it so I can add it to the profile object. Sure.
There was a problem hiding this comment.
It would be nice to have other people chime in as well ^_^
There was a problem hiding this comment.
Agreed, please poke people! 😄
There was a problem hiding this comment.
Mostly the way this is set up with an embedded callback makes it very difficult to change later, so if if it's easy to do and there's a plausible scenario where it might be needed I think it is worth it. There are a few types of drivers that can never be loaded by name/entrypoint and so I think it is plausible this could come up (e.g., giant statically linked piles of Rust crates, some R packages).
There was a problem hiding this comment.
updated with your suggestion here to allow the profile to provide an init_func
|
@CurtHagenlocher will be interested in this as well. |
| driver = "adbc_driver_snowflake" | ||
|
|
||
| [options] | ||
| adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)" |
There was a problem hiding this comment.
Hmm. Why not use option = { env_var = "NAME" }?
There was a problem hiding this comment.
That would prevent us from using environment variables to define substrings, which we will need for the drivers that expect their connection arguments to be in a URI.
uri = "scheme://host:port?user=user&pw=env_var(PASSWORD)
There was a problem hiding this comment.
Should we consider using ${} instead of env_var()?
That is more concise and makes escaping literals easier (e.g. with \$ or $$).
There was a problem hiding this comment.
I think ${} might be a little more familiar, unless you expect to eventually have other things besides env_var.
There was a problem hiding this comment.
I believe the expectation was that we would eventually have more special functions in addition to env_var which is why this was the original suggestion. For eventual consistency
There was a problem hiding this comment.
You will end up with the lambda calculus in these configuration strings :)
There was a problem hiding this comment.
I think we also still need to specify how to escape this (in the very unlikely instance that it's needed...)
There was a problem hiding this comment.
I was really trying to avoid inventing a way to escape this.... I can't even think of what a good syntax would be to escape this unless we change it to something like $env_var or provide some actual delimiter for it.
There was a problem hiding this comment.
I think that's an argument for changing the syntax, at least :)
Even if we don't specify how to escape it now, we may have to down the line...
There was a problem hiding this comment.
Additionally, I feel like most string substitution syntaxes put the delimiters on the outside ({foo(bar)}) which makes it more obvious (and possibly easier to parse?)
|
@CurtHagenlocher @davidhcoe @lidavidm @ianmcook any further comments? |
|
I've updated the comments and the rst document based on everyone's comments. Please re-review |
|
fyi @jadewang-db and @eric-wang-1990 |
| driver = "adbc_driver_snowflake" | ||
|
|
||
| [options] | ||
| adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)" |
There was a problem hiding this comment.
I think we also still need to specify how to escape this (in the very unlikely instance that it's needed...)
23fed89 to
3cebf5a
Compare
| #ifdef _WIN32 | ||
| const wchar_t* profiles_dir = L"Profiles"; | ||
| #elif defined(__APPLE__) |
There was a problem hiding this comment.
I forget whether the Windows configuration directory from InternalAdbcUserConfigDir() uses the "Roaming" or local version but it might be worth checking to make sure the Profiles are stored in the right one (i.e., should Profiles follow a user around when logging in to the same account on multiple computers, or should they be local to one computer?).
There was a problem hiding this comment.
Currently we use LocalAppData I believe, I don't think it necessarily makes sense for the profiles to follow the user around.
There was a problem hiding this comment.
I suppose if anybody notices they can open an issue 🙂
| AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile, | ||
| const char** driver_name, struct AdbcError* error); |
There was a problem hiding this comment.
I think you do want this while you're here and I think it is fairly easy to do. Providing an init function is a more predictable/secure way to precisely specify which driver will actually get loaded. There are now quite a lot of ways that a driver can be resolved based on name that change based on working directory/environment and if I were embedding ADBC in an application I would personally want the ability to be absolutely sure about what driver/version was loaded. (Alternatively, as an application I could invent my own way to manage Profiles and opt out of the driver manager).
| /// version = 1 | ||
| /// driver = "driver_name" | ||
| /// | ||
| /// [options] | ||
| /// option1 = "value1" | ||
| /// option2 = 42 | ||
| /// option3 = 3.14 |
There was a problem hiding this comment.
I think my point was just that you have some future flexibility if you nest the configuration and if you put adbc and profile in there somewhere there's a better chance somebody who runs across this file without context will know what it is.
| #ifdef _WIN32 | ||
| auto local_env_var = Utf8Decode(std::string(env_var_name)); | ||
| DWORD required_size = GetEnvironmentVariableW(local_env_var.c_str(), NULL, 0); | ||
| if (required_size == 0) { | ||
| out = ""; | ||
| return ADBC_STATUS_OK; | ||
| } |
There was a problem hiding this comment.
I see why this is important (to avoid putting secrets in .toml files), but there is now quite a bit of environment variable state that is required to do certain things (like prepend the driver search path with a directory of preferred drivers if I remember the order correctly), which means the chance of an application concurrently setting and the driver manager fetching an environment variable is non-trivial in the presence of something like a connection pool. Not in this PR, but it may be worth documenting how to create connections in a thread-safe way and/or ensuring there is a thread safe alternative to accomplish some of these things.
There was a problem hiding this comment.
I can agree with this, let's follow up in a separate PR for this though as you suggest.
| AdbcStatusCode (*GetDriverName)(struct AdbcConnectionProfile* profile, | ||
| const char** driver_name, struct AdbcError* error); |
There was a problem hiding this comment.
If we're allowing the profile to control everything else, why not let it also control the init_func? I could imagine distributing a profile implementation that bundles or loads drivers itself and doesn't want to let the driver manager do the search. We already let profiles omit the driver/specify it implicitly, so even if the bundled implementation won't ever use init_func, we may as well allow the option for other implementations of profiles.
Conversely...if telling the user to just init_func is sufficient, then why doesn't that also apply to profiles themselves?
| driver = "adbc_driver_snowflake" | ||
|
|
||
| [options] | ||
| adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)" |
There was a problem hiding this comment.
I think that's an argument for changing the syntax, at least :)
Even if we don't specify how to escape it now, we may have to down the line...
| driver = "adbc_driver_snowflake" | ||
|
|
||
| [options] | ||
| adbc.snowflake.sql.account = "env_var(SNOWFLAKE_ACCOUNT)" |
There was a problem hiding this comment.
Additionally, I feel like most string substitution syntaxes put the delimiters on the outside ({foo(bar)}) which makes it more obvious (and possibly easier to parse?)
| /// version = 1 | ||
| /// driver = "driver_name" | ||
| /// | ||
| /// [options] | ||
| /// option1 = "value1" | ||
| /// option2 = 42 | ||
| /// option3 = 3.14 |
There was a problem hiding this comment.
We don't do that for driver TOML files though 😬
| std::string message = | ||
| "Profile version '" + std::to_string(version) + | ||
| "' is not supported by this driver manager. Profile: " + profile_path.string(); |
There was a problem hiding this comment.
Incidentally I think we should discuss moving to C++20 soon...then we can use std::format
| return ParseDriverUriResult{d, str}; | ||
| if (d == "profile" && str.size() > pos + 2) { | ||
| // found a profile URI "profile://" | ||
| return ParseDriverUriResult{"", std::nullopt, str.substr(pos + 3)}; |
There was a problem hiding this comment.
not necessarily something we have to do in this PR, but I think this should become a std::variant (driver+URI or profile)
| @@ -43,6 +43,7 @@ std::filesystem::path InternalAdbcUserConfigDir(); | |||
| struct ParseDriverUriResult { | |||
There was a problem hiding this comment.
At some point we're going to need to figure out how better to expose internal functions for testing, e.g. I feel like env var substitution should be something that we unit-test directly, and having to redeclare everything is not a great way to do tests. (Also possibly break up the file, since it's getting big, and have some way to assemble a single source file for distribution, possibly one that we check in for convenience.)
There was a problem hiding this comment.
I agree, though I don't think it's something we need to do in this PR.
| auto diff = pos_match - pos_last_match; | ||
| auto start_match = end_of_last_match; | ||
| std::advance(start_match, diff); | ||
| if (pos_match > 0 && value[pos_match - 1] == '\\') { |
There was a problem hiding this comment.
Hmm, this syntax is a little funky. So just \{ isn't an escape, for instance, only {{...}}? I guess that's reasonable enough...it's not consistent with jinja2/etc. but also I don't know if we want to imply a full templating language either
There was a problem hiding this comment.
I'm open to any suggestions. I explicitly don't want to imply a full templating language. Currently we won't match a single { only {{, as far as I know, most templating languages don't actually provide a way to escape the {{ you just need to use an expression that outputs the {{ like {{ print "{{" }} etc. I also can't think of a reason offhand why someone would really need to escape it. So I'm fine removing this escaping since the syntax is funky. Up to you
There was a problem hiding this comment.
Honestly, I'd be OK not supporting escapes for now, and if/when someone asks for it then we can work out the syntax
There was a problem hiding this comment.
Just, I thought it was difficult for the env_var(FOO) to be escaped 🙂
Co-authored-by: David Li <li.davidm96@gmail.com>
| /// version = 1 | ||
| /// driver = "driver_name" | ||
| /// | ||
| /// [options] | ||
| /// option1 = "value1" | ||
| /// option2 = 42 | ||
| /// option3 = 3.14 |
There was a problem hiding this comment.
This one is very easy to change if it ever comes up 🙂
An attempt to implement the ideas as suggested in #3765 to provide the ability to specify and load connection profiles to define options for use by the driver manager.
This creates an
AdbcConnectionProfilestruct and defines anAdbcConnectionProfileProviderfunction pointer typedef to allow for customized management of profiles. This also implements a default file-based profile provider as described in #3765 (comment) which will be used if no custom provider has been set.This allows easy expansion in the future for non-file-based connection profile providers while still implementing the easier case of using file-based profiles, including hierarchical specification for now. See the documentation comments added to
adbc_driver_manager.hfor the full description of the semantics and explanation.