Skip to content

[SPARK-30617][SQL] Stop check values of spark.sql.catalogImplementation to improve expansibility #27349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

WeiWenda
Copy link

What changes were proposed in this pull request?

When user config spark.sql.catalogImplementation with value not in in-memory/hive, check if below properties is configured. If configured then instantiate SessionState with provided Class, or else throw Exception as usual.

spark.sql.catalogImplementation.[value of spark.sql.catalogImplementation].builder
spark.sql.catalogImplementation.[value of spark.sql.catalogImplementation].externalCatalog

For example:

spark.sql.catalogImplementation = qihoo
spark.sql.catalogImplementation.qihoo.builder = org.apache.spark.sql.qihoo.QihooSessionStateBuilder
spark.sql.catalogImplementation.qihoo.externalCatalog = org.apache.spark.sql.qihoo.QihooExternalCatalog

Why are the changes needed?

We have implemented a complex ExternalCatalog which is used for retrieving multi isomerism database's metadata(sush as elasticsearch、postgresql), so that we can make a mixture query between hive and our online data. But as spark require that value of spark.sql.catalogImplementation must be one of in-memory/hive, we have to modify SparkSession and rebuild spark to make our project work.
Finally, we hope spark removing above restriction, so that it's will be much easier to let us keep pace with new spark version. Thanks!

Does this PR introduce any user-facing change?

no

How was this patch tested?

no

…proach to let user define themselves metadata catalog
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -36,9 +36,13 @@ object StaticSQLConf {
.createWithDefault(Utils.resolveURI("spark-warehouse").toString)

val CATALOG_IMPLEMENTATION = buildStaticConf("spark.sql.catalogImplementation")
.doc("Spark built-in implementation is in-memory and hive," +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. ," -> , ".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"when set to yourImpl(can be any words), " +
"you need also provide following configuration to make everything going" +
"spark.sql.catalogImplementation.yourImpl.builder" +
"spark.sql.catalogImplementation.yourImpl.externalCatalog")
.internal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this is .internal() configuration. This PR seems to propose make this external conf.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that internal didn't effect anything. So I have removed .internal() in latest commit.

case other => conf.getOption(s"spark.sql.catalogImplementation.$other.builder")
.getOrElse {
throw new IllegalArgumentException(
"You need to configure spark.sql.catalogImplementation.xx.builder when xx configured by" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following will be better.

- "You need to configure spark.sql.catalogImplementation.xx.builder when xx configured by" +
+ s"You need to configure spark.sql.catalogImplementation.$other.builder when $other configured by" +

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

case other => conf.getOption(s"spark.sql.catalogImplementation.$other.externalCatalog")
.getOrElse {
throw new IllegalArgumentException(
"You need to configure spark.sql.catalogImplementation.xx.externalCatalog when xx " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 24, 2020

Technically, there exist previous JIRAs and PRs. The first one is similar to this.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @WeiWenda .

The recent community decision was using new CatalogPlugin for this kind of needs.

Since this is requested again, I'll ping the senior PMCs once more.

Hi, @rxin , @gatorsmile , @cloud-fan .
Can we extend spark.sql.catalogImplementation in 3.0.0?

@WeiWenda
Copy link
Author

WeiWenda commented Jan 24, 2020

@dongjoon-hyun Thanks for your help. SPARK-17767 and SPARK-19632's solution is so old that cannot be used in spark 3.0.0 anymore.

The recent community decision was using new CatalogPlugin for this kind of needs.

If there still has a long way to go with new CatalogPlugin, maybe cancelling checkValues of spark.sql. catalogImplementation is lowest-cost way to consider user-defined catalog.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 24, 2020

I don't think so. I intentionally didn't reopen it. (I'm the author of the oldest PR on SPARK-17767 and have been keeping track this requirement exactly since Apache Spark 2.0.0. 😄 )

SPARK-19632's solution is so old that cannot be used in spark 3.0.0 anymore.

CatalogPlugin is already available at 3.0.0-preview and 3.0.0-preview2 via https://issues.apache.org/jira/browse/SPARK-24252 . At 3.0.0, it will be a much stable feature.

If there still has a long way to go with new CatalogPlugin

@WeiWenda
Copy link
Author

Yeah, i noticed your SPARK-17767, you are foresight and responsible. I just look at #23915, it can solve our problem more gently. Is spark 3.0.0 will release in recent month?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 25, 2020

According to the plan, we will proceed feature freeze first by creating branch-3.0 in this month.

Then, QA period will be started. For major version, it takes some time.

In any way, your patch cannot be released faster than 3.0.0 because new feature is not backported to branch-2.4. (2.4.5 RC1 vote was held last week and we are receiving some more correctness patches. 2.4.5 RC2 vote will start soon. 2.4.5 will be released before 3.0.0.)

@dongjoon-hyun
Copy link
Member

BTW, if you agree with CatalogPlugin in 3.0.0, could you close the PR and JIRA please?

@WeiWenda
Copy link
Author

BTW, if you agree with CatalogPlugin in 3.0.0, could you close the PR and JIRA please?

All right, we will waiting for spark 3.0.0. Thank you anyway.

@WeiWenda WeiWenda closed this Jan 25, 2020
@dongjoon-hyun
Copy link
Member

Thank you, @WeiWenda .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants