Skip to content

[SPARK-21287][SQL] Move the validation of fetch size from JDBCOptions to JdbcDialect #26230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

fuwhu
Copy link
Contributor

@fuwhu fuwhu commented Oct 23, 2019

What changes were proposed in this pull request?

Move the validation of fetch size from JDBCOptions to JdbcDialect, so that different dialects can have different validation logic for fetch size property.

Why are the changes needed?

To fix #SPARK-21287 , namely to support data fetch in stream manner against MySQL database.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests (JDBCSuite).

fuwhu added 2 commits October 23, 2019 22:44
to support data fetch in stream manner (fetch one row at a time)
against MySQL database.
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good.

@fuwhu fuwhu requested a review from srowen October 23, 2019 22:21
@SparkQA
Copy link

SparkQA commented Oct 23, 2019

Test build #4903 has finished for PR 26230 at commit 7316549.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 23, 2019

I think the [[Foo.Bar]] references in the docs are problematic for now. I'd just remove them.

@maropu
Copy link
Member

maropu commented Oct 24, 2019

Can you add some tests in JDBCSuite? You can see the other examples there, e.g.,

fuwhu added 2 commits October 24, 2019 14:55
2. Add test "[SPARK-21287] Dialect validate properties" in JDBCSuite.
@fuwhu
Copy link
Contributor Author

fuwhu commented Oct 24, 2019

@maropu updated, please help review again.

* in [[org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions]].
* @param properties The connection properties. This is passed through from the relation.
*/
def validateProperties(properties: Map[String, String]): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. wait .. properties: Map[String, String] this will requires Scala map instead of Java map, if we should implement the dialect from Java side. We could just send Properties as are I suspect.

Copy link
Contributor Author

@fuwhu fuwhu Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose scala Map because i saw the JdbcDialect.beforeFetch accepts scala Map as parameter.
I agree with you, we may consider changing the parameter type for both beforeFetch and validateProperties, but i think we can leave this to another PR.
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to leave as-is here.

@@ -147,14 +147,7 @@ class JDBCOptions(
""".stripMargin
)

val fetchSize = {
val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt
require(size >= 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we could just remove this condition and set it as is, and then expect the DBMS to throw an error as well. Seems like Postgres throws an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put the discussion on SPARK-21287 here for reference :
https://issues.apache.org/jira/browse/SPARK-21287

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's fine too. It isn't really supposed to be negative, but there's one notable exception in MySQL's driver. Even so, we don't manually check other settings here, and so yeah I don't know if there's much value in this check in the end. I'd be OK just removing it entirely too.

Copy link
Contributor Author

@fuwhu fuwhu Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen @HyukjinKwon I think both of these two solutions should be ok, so I submitted another PR #26244 which just remove the check of non-negative condition in JDBCOptions.
You can decide which one to follow up, i'll close the other.

dongjoon-hyun pushed a commit that referenced this pull request Oct 24, 2019
### What changes were proposed in this pull request?
 Remove the requirement of fetch_size>=0 from JDBCOptions to allow negative fetch size.

### Why are the changes needed?

Namely, to allow data fetch in stream manner (row-by-row fetch) against MySQL database.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Unit test (JDBCSuite)

This closes #26230 .

Closes #26244 from fuwhu/SPARK-21287-FIX.

Authored-by: fuwhu <bestwwg@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 92b2529)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@fuwhu fuwhu deleted the SPARK-21287 branch October 27, 2019 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants