-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-30617][SQL] Stop check values of spark.sql.catalogImplementation to improve expansibility #27349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…proach to let user define themselves metadata catalog
Can one of the admins verify this patch? |
@@ -36,9 +36,13 @@ object StaticSQLConf { | |||
.createWithDefault(Utils.resolveURI("spark-warehouse").toString) | |||
|
|||
val CATALOG_IMPLEMENTATION = buildStaticConf("spark.sql.catalogImplementation") | |||
.doc("Spark built-in implementation is in-memory and hive," + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. ,"
-> , "
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"when set to yourImpl(can be any words), " + | ||
"you need also provide following configuration to make everything going" + | ||
"spark.sql.catalogImplementation.yourImpl.builder" + | ||
"spark.sql.catalogImplementation.yourImpl.externalCatalog") | ||
.internal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this is .internal()
configuration. This PR seems to propose make this external conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that internal didn't effect anything. So I have removed .internal() in latest commit.
case other => conf.getOption(s"spark.sql.catalogImplementation.$other.builder") | ||
.getOrElse { | ||
throw new IllegalArgumentException( | ||
"You need to configure spark.sql.catalogImplementation.xx.builder when xx configured by" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following will be better.
- "You need to configure spark.sql.catalogImplementation.xx.builder when xx configured by" +
+ s"You need to configure spark.sql.catalogImplementation.$other.builder when $other configured by" +
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
case other => conf.getOption(s"spark.sql.catalogImplementation.$other.externalCatalog") | ||
.getOrElse { | ||
throw new IllegalArgumentException( | ||
"You need to configure spark.sql.catalogImplementation.xx.externalCatalog when xx " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Technically, there exist previous JIRAs and PRs. The first one is similar to this.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @WeiWenda .
The recent community decision was using new CatalogPlugin
for this kind of needs.
Since this is requested again, I'll ping the senior PMCs once more.
Hi, @rxin , @gatorsmile , @cloud-fan .
Can we extend spark.sql.catalogImplementation
in 3.0.0
?
@dongjoon-hyun Thanks for your help. SPARK-17767 and SPARK-19632's solution is so old that cannot be used in spark 3.0.0 anymore.
If there still has a long way to go with new CatalogPlugin, maybe cancelling checkValues of |
I don't think so. I intentionally didn't reopen it. (I'm the author of the oldest PR on SPARK-17767 and have been keeping track this requirement exactly since Apache Spark 2.0.0. 😄 )
|
Yeah, i noticed your SPARK-17767, you are foresight and responsible. I just look at #23915, it can solve our problem more gently. Is spark 3.0.0 will release in recent month? |
According to the plan, we will proceed Then, QA period will be started. For major version, it takes some time. In any way, your patch cannot be released faster than 3.0.0 because new feature is not backported to |
BTW, if you agree with |
All right, we will waiting for spark 3.0.0. Thank you anyway. |
Thank you, @WeiWenda . |
What changes were proposed in this pull request?
When user config spark.sql.catalogImplementation with value not in in-memory/hive, check if below properties is configured. If configured then instantiate SessionState with provided Class, or else throw Exception as usual.
For example:
Why are the changes needed?
We have implemented a complex ExternalCatalog which is used for retrieving multi isomerism database's metadata(sush as elasticsearch、postgresql), so that we can make a mixture query between hive and our online data. But as spark require that value of spark.sql.catalogImplementation must be one of in-memory/hive, we have to modify SparkSession and rebuild spark to make our project work.
Finally, we hope spark removing above restriction, so that it's will be much easier to let us keep pace with new spark version. Thanks!
Does this PR introduce any user-facing change?
no
How was this patch tested?
no