[SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFrameReader.format("jdbc").load by HyukjinKwon · Pull Request #15499 · apache/spark

HyukjinKwon · 2016-10-15T12:45:34Z

What changes were proposed in this pull request?

This PR proposes to make DataFrameReader.jdbc call DataFrameReader.format("jdbc").load consistently with other APIs in DataFrameReader/DataFrameWriter and avoid calling sparkSession.baseRelationToDataFrame(..) here and there.

The changes were mostly copied from DataFrameWriter.jdbc() which was recently updated.

-    val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
-    val options = new JDBCOptions(url, table, params)
-    val relation = JDBCRelation(parts, options)(sparkSession)
-    sparkSession.baseRelationToDataFrame(relation)
+    this.extraOptions = this.extraOptions ++ connectionProperties.asScala
+    // explicit url and dbtable should override all
+    this.extraOptions += ("url" -> url, "dbtable" -> table)
+    format("jdbc").load()

How was this patch tested?

Existing tests should cover this.

…rmat("jdbc")

SparkQA · 2016-10-15T14:47:38Z

Test build #67011 has finished for PR 15499 at commit 0d6e2d1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-18T20:48:05Z

Merging in master. Thanks.

…mat("jdbc").load ## What changes were proposed in this pull request? This PR proposes to make `DataFrameReader.jdbc` call `DataFrameReader.format("jdbc").load` consistently with other APIs in `DataFrameReader`/`DataFrameWriter` and avoid calling `sparkSession.baseRelationToDataFrame(..)` here and there. The changes were mostly copied from `DataFrameWriter.jdbc()` which was recently updated. ```diff - val params = extraOptions.toMap ++ connectionProperties.asScala.toMap - val options = new JDBCOptions(url, table, params) - val relation = JDBCRelation(parts, options)(sparkSession) - sparkSession.baseRelationToDataFrame(relation) + this.extraOptions = this.extraOptions ++ connectionProperties.asScala + // explicit url and dbtable should override all + this.extraOptions += ("url" -> url, "dbtable" -> table) + format("jdbc").load() ``` ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15499 from HyukjinKwon/SPARK-17955.

gatorsmile · 2016-11-21T22:56:48Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

    // connectionProperties should override settings in extraOptions.
-    val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
-    val options = new JDBCOptions(url, table, params)
-    val relation = JDBCRelation(parts, options)(sparkSession)


After this change, we lost the feature for parallel JDBC reading, right?

Let me revert this back.

I see. I thought this was all handled in

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala

Lines 29 to 46 in 0c0ad43

override def createRelation(

sqlContext: SQLContext,

parameters: Map[String, String]): BaseRelation = {

val jdbcOptions = new JDBCOptions(parameters)

val partitionColumn = jdbcOptions.partitionColumn

val lowerBound = jdbcOptions.lowerBound

val upperBound = jdbcOptions.upperBound

val numPartitions = jdbcOptions.numPartitions

val partitionInfo = if (partitionColumn == null) {

null

} else {

JDBCPartitioningInfo(

partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)

}

val parts = JDBCRelation.columnPartition(partitionInfo)

JDBCRelation(parts, jdbcOptions)(sqlContext.sparkSession)

}

.

Would this be possible to adding those into options rather than reverting this?

We are using a different mechanism for table partitioning. More flexible. Users can do more advanced partitioning (e.g., using multiple columns) here.

I see. I am sorry it seems a careless mistake.

…mat("jdbc").load ## What changes were proposed in this pull request? This PR proposes to make `DataFrameReader.jdbc` call `DataFrameReader.format("jdbc").load` consistently with other APIs in `DataFrameReader`/`DataFrameWriter` and avoid calling `sparkSession.baseRelationToDataFrame(..)` here and there. The changes were mostly copied from `DataFrameWriter.jdbc()` which was recently updated. ```diff - val params = extraOptions.toMap ++ connectionProperties.asScala.toMap - val options = new JDBCOptions(url, table, params) - val relation = JDBCRelation(parts, options)(sparkSession) - sparkSession.baseRelationToDataFrame(relation) + this.extraOptions = this.extraOptions ++ connectionProperties.asScala + // explicit url and dbtable should override all + this.extraOptions += ("url" -> url, "dbtable" -> table) + format("jdbc").load() ``` ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15499 from HyukjinKwon/SPARK-17955.

HyukjinKwon added 2 commits October 15, 2016 21:36

Use the same read path in DataFrameReader.jdbc and DataFrameReader.fo…

aa8cd35

…rmat("jdbc")

Add missing dots

0d6e2d1

asfgit closed this in b3130c7 Oct 18, 2016

HyukjinKwon deleted the SPARK-17955 branch October 19, 2016 06:57

gatorsmile reviewed Nov 21, 2016

View reviewed changes

This was referenced Nov 21, 2016

[SPARK-18413][SQL][FOLLOW-UP] Use numPartitions instead of maxConnections #15966

Closed

[SPARK-18538] [SQL] Fix Concurrent Table Fetching Using DataFrameReader JDBC APIs #15975

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFrameReader.format("jdbc").load#15499

[SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFrameReader.format("jdbc").load#15499
HyukjinKwon wants to merge 2 commits intoapache:masterfrom
HyukjinKwon:SPARK-17955

HyukjinKwon commented Oct 15, 2016 •

edited

Loading

Uh oh!

SparkQA commented Oct 15, 2016

Uh oh!

rxin commented Oct 18, 2016

Uh oh!

gatorsmile Nov 21, 2016

Uh oh!

gatorsmile Nov 21, 2016

Uh oh!

HyukjinKwon Nov 21, 2016

Uh oh!

gatorsmile Nov 21, 2016 •

edited

Loading

Uh oh!

HyukjinKwon Nov 21, 2016

Uh oh!

gatorsmile Nov 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	override def createRelation(
	sqlContext: SQLContext,
	parameters: Map[String, String]): BaseRelation = {
	val jdbcOptions = new JDBCOptions(parameters)
	val partitionColumn = jdbcOptions.partitionColumn
	val lowerBound = jdbcOptions.lowerBound
	val upperBound = jdbcOptions.upperBound
	val numPartitions = jdbcOptions.numPartitions

	val partitionInfo = if (partitionColumn == null) {
	null
	} else {
	JDBCPartitioningInfo(
	partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
	}
	val parts = JDBCRelation.columnPartition(partitionInfo)
	JDBCRelation(parts, jdbcOptions)(sqlContext.sparkSession)
	}

Conversation

HyukjinKwon commented Oct 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 15, 2016

Uh oh!

rxin commented Oct 18, 2016

Uh oh!

gatorsmile Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 21, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Oct 15, 2016 •

edited

Loading

gatorsmile Nov 21, 2016 •

edited

Loading