[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect #25344

shivsood · 2019-08-03T01:45:16Z

What changes were proposed in this pull request?

Change mapping of BYTETYPE from BYTE to TinyINT.

Why are the changes needed?

Writing dataframe with column type BYTETYPE fails when using JDBC connector for SQL Server. The problem is due to

(Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped to TINYINT
case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))

In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from the point of view of upcasting, but will lead to 4 byte allocation rather than 1 byte for BYTETYPE.

(read path) Read path ends up calling makeGetter(dt: DataType, metadata: Metadata). The function sets the value in RDD row. The value is set per the data type. Here there is no mapping for BYTETYPE and thus results will result in an error when getCatalystType() is fixed.

Note : These issues were found when reading/writing with SQLServer.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Path was tested with integration test by adding test case to MsSqlServerIntegrationSuite.scala. Test case passed on local tests.

…erverDialect.scala. The fix fails with the following error when df.write is done with a dataframe what contains a ByteType. 19/08/02 18:25:44 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1197 bytes result sent to driver 19/08/02 18:25:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 43 ms on localhost (executor driver) (1/2) 19/08/02 18:25:44 INFO CodeGenerator: Code generated in 14.586963 ms 19/08/02 18:25:44 ERROR Executor: Exception in task 1.0 in stage 7.0 (TID 8) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Integer is not a valid external type for schema of tinyint if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, serialNum), ByteType) AS serialNum#231 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:344) at org.apache.spark.sql.SparkSession.$anonfun$createDataFrame$1(SparkSession.scala:367) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:662) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:845)

shivsood · 2019-08-03T01:53:10Z

@dongjoon-hyun @srowen This is PR for ByteType fix. I am seeing a catalyst exception post my fix. I will need your help/guidance on resolving this. I see that i fails in ValidateExternalType() with mismatched type , but i am not able to get to exact flow of events here.

srowen

Does this supersede one of your earlier PRs?

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

dongjoon-hyun · 2019-08-05T04:59:59Z

Hi, @shivsood . So, do you mean this PR fails at UT in your local environment?

dongjoon-hyun · 2019-08-05T05:00:06Z

ok to test

SparkQA · 2019-08-05T05:10:11Z

Test build #108642 has finished for PR 25344 at commit 1abf3fc.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-08-05T16:00:45Z

Hi, @shivsood . Please fix the scala style issues first which @srowen already commented.

shivsood · 2019-08-05T17:55:32Z

Hi, @shivsood . So, do you mean this PR fails at UT in your local environment?

@dongjoon-hyun No. I have added a new E2E test case where i create a table with BYTETYPE. That fails. I was aware of that failure and that why the fix. Post fix its failing at the next level ( refer to How was patch tested section of the PR).

shivsood · 2019-08-05T17:57:07Z

Hi, @shivsood . Please fix the scala style issues first which @srowen already commented.

@dongjoon-hyun Sorry about these. I would fix. Have to check why these are not coming in my build. For me the source code build gives scala style warning, but test code does not seem to give me warnings.

…yte and not java.lang.interger. Fixed styling issues noted in the review comments

shivsood · 2019-08-05T21:01:38Z

es this supersede one of your earlier PRs?

Yes.

shivsood

Please fix the scala style issues first which @srowen already com

Done. I realize dev/stylecheck should be run manually for test cases. I presumed its get run automatically when i build. Apparently not for test code.

SparkQA · 2019-08-05T23:16:18Z

Test build #108681 has finished for PR 25344 at commit 3a5a621.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
assert(types(1).equals(\"class java.lang.Byte\"))

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

…ype post read

dongjoon-hyun · 2019-10-15T02:28:43Z

Thank you for updating, @shivsood ! 😄

SparkQA · 2019-10-15T02:58:49Z

Test build #112079 has finished for PR 25344 at commit 7658bbc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
assert(colType(0) == \"class java.lang.Byte\")

dongjoon-hyun · 2019-10-15T04:37:05Z

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

@@ -21,6 +21,8 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties

+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.types._


Let's remove this. I'll give the updated short test code.

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

dongjoon-hyun

Please update the test code according to the comment.
(And, I revised the PR description according to the latest template).

maropu · 2019-10-23T04:36:27Z

@shivsood ping, are you there?

shivsood · 2019-10-23T17:55:37Z

@shivsood ping, are you there?

@maropu Yes, Would handle this this week. Please note that ShortType issue is fixed SPARK-28152. Or are you pointing to a different issue?

maropu · 2019-10-24T05:10:30Z

That should be like this? https://github.com/apache/spark/pull/25344/files#diff-391379a5ec51082e2ae1209db15c02b3R549

    case ShortType =>
      (stmt: PreparedStatement, row: Row, pos: Int) =>
        stmt.setInt(pos + 1, row.getShort(pos))

=>

    case ShortType =>
      (stmt: PreparedStatement, row: Row, pos: Int) =>
        stmt.setShort(pos + 1, row.getShort(pos))

?

srowen · 2019-10-29T13:24:09Z

Ping @shivsood

shivsood · 2019-10-30T00:44:10Z

That should be like this? https://github.com/apache/spark/pull/25344/files#diff-391379a5ec51082e2ae1209db15c02b3R549
    case ShortType =>
      (stmt: PreparedStatement, row: Row, pos: Int) =>
        stmt.setInt(pos + 1, row.getShort(pos))
=>
    case ShortType =>
      (stmt: PreparedStatement, row: Row, pos: Int) =>
        stmt.setShort(pos + 1, row.getShort(pos))
?
@maropu Thanks for pointing out. From the code this looks like a problem. I dont know how to repro a failure here. Will send out a separate PR for this in some time. Do you suspect that this code should lead some corruption that's hard to repro. IMO that a short(2 bytes) will get allocated here, but this code will end up writing as a integer.

@maropu Created JIRA https://issues.apache.org/jira/browse/SPARK-29644 for this issue. Will send a PR with the resolution and test.

@maropu The PR for the fix is #26301

SparkQA · 2019-10-30T03:01:00Z

Test build #112872 has finished for PR 25344 at commit ec5315d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-10-31T16:26:15Z

@shivsood @maropu so that I can keep track of this -- it seems like this is a combination of SPARK-29644 and SPARK-28152, but for bytes, right?

The question there was testing, whether the test should be more general, because the fix is general. Same here, how much of this is specific to SQL Server? the setByte part isn't, it seems.

And the question there too was whether a deeper fix is needed? does that need to be reflected here? I'm pretty OK to fix this isolated issue right now if it by itself solves a problem.

maropu · 2019-10-31T23:47:41Z

@shivsood Can you solve the "SQL server-specific" issue of MsSqlServerDialect in this pr, first? Then, you could address more general issues of JdbcUtils in following prs.

shivsood · 2019-11-01T00:42:29Z

@shivsood Can you solve the "SQL server-specific" issue of MsSqlServerDialect in this pr, first? Then, you could address more general issues of JdbcUtils in following prs.
@maropu the fix will not work for "SQL Server" if change in JDBCUtils is not done. As far as i remember, it gives an error in the read path. Let me check this again and get back. Ideally i want to keep the "SQL server" fix and generic fix separate.

I am putting this PR back into WIP for now.

maropu · 2019-11-01T00:50:54Z

@maropu the fix will not work for "SQL Server" if change in JDBCUtils is not done. As far as i remember, it gives an error in the read path. Let me check this again and get back. Ideally i want to keep the "SQL server" fix and generic fix separate.

Ah, I see. If so, how about fixing the more general issue in JDBCUtils, first? Then, fixing the "SQL server" issue.

shivsood · 2019-11-01T00:56:25Z

Ah, I see. If so, how about fixing the more general issue in JDBCUtils, first? Then, fixing the "SQL server" issue

That would be right approach. It would be quite a test effort. I have to get familiar with unit test and E2E test for all other servers. Will start some work on that soon.

maropu · 2019-11-01T00:58:17Z

Thanks, @shivsood !

shivsood · 2019-11-01T00:59:33Z

Thanks, @shivsood !

Thanks a lot for your comments every one.

srowen · 2019-11-02T14:23:13Z

Is this superseded by #26301 ?

shivsood · 2019-11-05T02:32:31Z

Is this superseded by #26301 ?

Yes, #26301 supersedes this fix. Will remove this PR after #26301 is accepted.

shivsood · 2019-11-14T03:42:17Z

Closing as this is fixed by PR #26301 which is now accepted to master.

shivsood changed the title ~~[WIP][SPARK-2815][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect~~ [WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect Aug 3, 2019

srowen requested changes Aug 3, 2019

View reviewed changes

dongjoon-hyun added the SQL label Aug 4, 2019

fixed Numeric types test. tinyint should be compared with java.lang.b…

3a5a621

…yte and not java.lang.interger. Fixed styling issues noted in the review comments

shivsood commented Aug 5, 2019

View reviewed changes

srowen reviewed Aug 6, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Outdated Show resolved Hide resolved

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala Show resolved Hide resolved

Fixed test. Added test to validate broader value range and col data t…

7658bbc

…ype post read

shivsood changed the title ~~[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect~~ [SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect Oct 15, 2019

dongjoon-hyun reviewed Oct 15, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Oct 15, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Oct 15, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Outdated Show resolved Hide resolved

dongjoon-hyun requested changes Oct 15, 2019

View reviewed changes

test fix to use testimplicits for simplicity

ec5315d

shivsood changed the title ~~[SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect~~ [WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect Nov 1, 2019

shivsood closed this Nov 14, 2019

[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect #25344

[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect #25344

Uh oh!

Conversation

shivsood commented Aug 3, 2019 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

shivsood commented Aug 3, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Aug 5, 2019

Uh oh!

dongjoon-hyun commented Aug 5, 2019

Uh oh!

SparkQA commented Aug 5, 2019

Uh oh!

dongjoon-hyun commented Aug 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivsood commented Aug 5, 2019

Uh oh!

shivsood commented Aug 5, 2019

Uh oh!

shivsood commented Aug 5, 2019

Uh oh!

shivsood left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 5, 2019

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Oct 15, 2019

Uh oh!

SparkQA commented Oct 15, 2019

Uh oh!

dongjoon-hyun Oct 15, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Oct 23, 2019

Uh oh!

shivsood commented Oct 23, 2019

Uh oh!

maropu commented Oct 24, 2019

Uh oh!

srowen commented Oct 29, 2019

Uh oh!

shivsood commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 30, 2019

Uh oh!

srowen commented Oct 31, 2019

Uh oh!

maropu commented Oct 31, 2019

Uh oh!

shivsood commented Nov 1, 2019

Uh oh!

maropu commented Nov 1, 2019

Uh oh!

shivsood commented Nov 1, 2019

Uh oh!

maropu commented Nov 1, 2019

Uh oh!

shivsood commented Aug 3, 2019 •

edited by dongjoon-hyun

Loading

dongjoon-hyun commented Aug 5, 2019 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

shivsood commented Oct 30, 2019 •

edited

Loading