Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32001][SQL]Create JDBC authentication provider developer API #29024
[SPARK-32001][SQL]Create JDBC authentication provider developer API #29024
Changes from 3 commits
8caa8e7
e919ac4
226f177
265b26e
50e33ea
054535e
d54596f
184fd26
1f562f8
b2fd5d8
1297304
9dbed59
69856f6
0412490
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal class, so could you avoid exposing it? How about using
Map[String, String]
instead in the provider?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, this way the API is less exposed to changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done the initial steps but I see bad and worse tradeoffs. The API is less dependent on
JDBCOptions
changes but the provider side implementation becames more complicated with either duplicated code or duplicated instantiation. Let me explain in examples:Option1: Re-instantiate
JDBCOptions
where duplicated instantiation worries me + the general concern down below.Option2: Find out the needed parameters on by own where re-inventing the wheel feeling worries me + the general concern down below.
General concern: Both cases Spark can be good because in the first case new
JDBCOptions
instantiation, in the second case copy paste moved into a base class can fill the gap. However considering myself as a 3rd-party developer I have not much options (sinceJDBCOptions
is not exposed as developer API):JDBCOptions
which may change over timeConsidering these findings I think it's better to keep
JDBCOptions
. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1
looks better. But, 3rd-party developers need to useJDBCOptions
there? Or, could we just pass the two params only?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could pass the 2 params but then we limit further implementation possibilities so I would vote on the map.
At the moment there is no need other params other than
keytab
andprincipal
but later providers may need further things. It's not a strong opinion, just don't want to close later possibilities. If we agree on the way I'll do the changes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example I've given in the other open question here can show how parameters other than
keytab
andprincipal
can be used. Not passing the whole map would close this possibility.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @maropu here.
JDBCOptions
is underexecution
package and meant to be private.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon thanks for having a look!
I agree that
JDBCOptions
mustn't be exposed. Let me change the code to showoption 1
. As said passing onlykeytab: String, principal: String
is not enough because not all but some of the providers need further configurations. I've started to work on this this change (unless anybody has better option).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed
JDBCOptions
from the API but keeping this open if further discussion needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting the following exception on my console permanently while running JDBC tests. Should it be really logged as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log it as a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can decrease it to warning. The main message is to notify the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to ignore the error case where it fails to load builtin providers?
DataSource
throws an exception if it fails to load builtin datasources:spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
Lines 694 to 702 in 94d648d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you mean ignore? Providers must be loaded independently so we need to catch and ignore the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the suggestion is let the exception fire then I would say it's bad idea. If a provider is not able to be loaded then the rest must go. We've had similar issue and expectation in hadoop delegation token area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the policy looks okay for secure connections, but how about the basic one,
BasicConnectionProvider
? At least, until Spark v3.0, creating basic JDBC connections does not fail because of the loading failure.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's the same since
BasicConnectionProvider
is built-in and no pre-load inside. The main issue would come when one adds a new provider which unable to be loaded. That failure would make all the rest workload fail if we don't load them independently.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#29968