-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-16401] [SQL] Data Source API: Enable Extending RelationProvider and CreatableRelationProvider without Extending SchemaRelationProvider #14075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is this a regression? |
Will make a try tonight. My environment for 1.6 is at home. |
Test build #61864 has finished for PR 14075 at commit
|
@rxin It does look like this might have been a regression introduced via the initial creation of |
@rxin This is a regression. I did try it in Spark 1.6. It works well. I think we need to fix it in Spark 2.0 Thanks! |
Thanks - then we should merge this in 2.0. |
Sure, let me know whether I need to submit another PR for backporting to 2.0. Thanks! |
cc @cloud-fan and @liancheng for review. |
sqlContext: SQLContext, | ||
parameters: Map[String, String]): BaseRelation = { | ||
LastOptions.parameters = parameters | ||
LastOptions.schema = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assign values here? We don't check them in the test right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is from the another default source. It is useless in this test case. Let me clean them. Thanks!
Nice catch! The fix LGTM. |
override def createRelation( | ||
sqlContext: SQLContext, | ||
parameters: Map[String, String]): BaseRelation = { | ||
LastOptions.parameters = parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, is it a convention to assign parameters to LastOptions
? I think the test can work without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure. Let me remove all these things. Thanks!
Test build #61984 has finished for PR 14075 at commit
|
Test build #61990 has finished for PR 14075 at commit
|
… and CreatableRelationProvider without Extending SchemaRelationProvider #### What changes were proposed in this pull request? When users try to implement a data source API with extending only `RelationProvider` and `CreatableRelationProvider`, they will hit an error when resolving the relation. ```Scala spark.read .format("org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema") .load() .write. format("org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema") .save() ``` The error they hit is like ``` org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema does not allow user-specified schemas.; org.apache.spark.sql.AnalysisException: org.apache.spark.sql.test.DefaultSourceWithoutUserSpecifiedSchema does not allow user-specified schemas.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:319) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:494) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) ``` Actually, the bug fix is simple. [`DataSource.createRelation(sparkSession.sqlContext, mode, options, data)`](https://github.com/gatorsmile/spark/blob/dd644f8117e889cebd6caca58702a7c7e3d88bef/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L429) already returns a BaseRelation. We should not assign schema to `userSpecifiedSchema`. That schema assignment only makes sense for the data sources that extend `FileFormat`. #### How was this patch tested? Added a test case. Author: gatorsmile <gatorsmile@gmail.com> Closes #14075 from gatorsmile/dataSource. (cherry picked from commit 7374e51) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master and 2.0! |
What changes were proposed in this pull request?
When users try to implement a data source API with extending only
RelationProvider
andCreatableRelationProvider
, they will hit an error when resolving the relation.The error they hit is like
Actually, the bug fix is simple.
DataSource.createRelation(sparkSession.sqlContext, mode, options, data)
already returns a BaseRelation. We should not assign schema touserSpecifiedSchema
. That schema assignment only makes sense for the data sources that extendFileFormat
.How was this patch tested?
Added a test case.