[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301

shivsood · 2019-10-30T02:05:43Z

What changes were proposed in this pull request?

Corrected ShortType and ByteType mapping to SmallInt and TinyInt, corrected setter methods to set ShortType and ByteType as setShort() and setByte(). Changes in JDBCUtils.scala
Fixed Unit test cases to where applicable and added new E2E test cases in to test table read/write using ShortType and ByteType.

Problems

In master in JDBCUtils.scala line number 547 and 551 have a problem where ShortType and ByteType are set as Integers rather than set as Short and Byte respectively.

case ShortType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getShort(pos))
The issue was pointed out by @maropu 

case ByteType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
 stmt.setInt(pos + 1, row.getByte(pos))

Also at line JDBCUtils.scala 247 TinyInt is interpreted wrongly as IntergetType in getCatalystType()

case java.sql.Types.TINYINT => IntegerType

At line 172 ShortType was wrongly interpreted as IntegerType
case ShortType => Option(JdbcType("INTEGER", java.sql.Types.SMALLINT))
All thru out tests, ShortType and ByteType were being interpreted as IntegerTypes.

Why are the changes needed?

A given type should be set using the right type.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Corrected Unit test cases where applicable. Validated in CI/CD
Added a test case in MsSqlServerIntegrationSuite.scala, PostgresIntegrationSuite.scala , MySQLIntegrationSuite.scala to write/read tables from dataframe with cols as shorttype and bytetype. Validated by manual as follows.

./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12

…e write/read tables from dataframe with colms as shorttype

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

… JDBCWriterSuite and E2E test case in dockers integration suite

srowen

So this change includes all the fixes we have been talking about in related PRs? It fixes two general problems in JDBCUtils, and one or two other mapping issues in MS SQL support? that could be OK. I think we can remove WIP from the title as it looks like a good proposal.

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

SparkQA · 2019-11-02T15:42:44Z

Test build #4913 has finished for PR 26301 at commit 66dfd4c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
assert(colType(0) == \"class java.lang.Short\")
assert(colType(0) == \"class java.lang.Byte\")

maropu · 2019-11-02T23:14:06Z

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

@@ -202,4 +202,46 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite {
    df2.write.jdbc(jdbcUrl, "datescopy", new Properties)
    df3.write.jdbc(jdbcUrl, "stringscopy", new Properties)
  }
+
+  test("Write tables with ShortType") {


I think this is a bug, so can you append the JIRA ID in the prefix?

@maropu is JIRA addition required at all places i have added a new test cases?

Yea, plz follow test("SPARK-29644: test title")

shivsood · 2019-11-05T02:33:12Z

So this change includes all the fixes we have been talking about in related PRs? It fixes two general problems in JDBCUtils, and one or two other mapping issues in MS SQL support? that could be OK. I think we can remove WIP from the title as it looks like a good proposal.

Yes, with this PR we are fixing the core issue in JDBCUtils. So changes in MSSQL dialect are not required.

(on WIP) I still have some work pending to fix some test. Once that's done , will move this from WIP to ready.

srowen · 2019-11-08T16:26:18Z

@shivsood @maropu I'd love to get this into Spark 3 as it seems like a good fix. Is this good to go other than the title and test descriptions? sounds like it will let us close or unblock some other changes.

shivsood · 2019-11-08T21:24:05Z

@shivsood @maropu I'd love to get this into Spark 3 as it seems like a good fix. Is this good to go other than the title and test descriptions? sounds like it will let us close or unblock some other changes.

@srowen I have a failing test to fix. Will try to get that out by this weekend.

maropu · 2019-11-08T22:54:33Z

I just didn't check this because of WIP though, I personally think we could merge this if all the test failures fixed.

…getByte and getShort instead of getInt. Added PR number to test title as suggested

SparkQA · 2019-11-11T17:52:37Z

Test build #4925 has finished for PR 26301 at commit 9cdad2b.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-11-11T18:11:34Z

Test build #4924 has finished for PR 26301 at commit 9cdad2b.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-11-11T20:57:54Z

Test build #4928 has finished for PR 26301 at commit 9cdad2b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-11-11T23:13:44Z

still WIP?

shivsood · 2019-11-12T19:25:42Z

still WIP?

No, we are good. Test passed. I am moving this to ready now.

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

maropu

cc: @srowen @dongjoon-hyun

dongjoon-hyun · 2019-11-14T00:07:53Z

Retest this please.

dongjoon-hyun · 2019-11-14T00:11:04Z

Thank you for pinging me. I also agree with you guys about the follow-up for test case refactoring.

dongjoon-hyun

+1, LGTM. Merged to master.

I also verified this locally. This also passed with MsSqlServerIntegrationSuite, OracleIntegrationSuite, PostgresIntegrationSuite, MySQLIntegrationSuite.

Thank you so much, @shivsood , @maropu , @srowen .

shivsood · 2019-11-14T02:16:07Z

Thank you @maropu @srowen @dongjoon-hyun. 'll also create a request to port this change to 2.4

SparkQA · 2019-11-14T02:35:58Z

Test build #113737 has finished for PR 26301 at commit 0600894.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-11-14T07:08:39Z

Sure. Thanks!

## What changes were proposed in this pull request? This is a port of SPARK-26499 to 2.4 Modifed JdbcUtils.makeGetter to handle ByteType. ## How was this patch tested? Added a new test to JDBCSuite that maps ```TINYINT``` to ```ByteType```. Closes #23400 from twdsilva/tiny_int_support. Authored-by: Thomas D'Silva <tdsilvaapache.org> Signed-off-by: Hyukjin Kwon <gurwls223apache.org> ### Why are the changes needed? Changes are required to port pr #26301 (SPARK-29644) to 2.4 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Yes, tested on 2.4 with using docker integration test for MySQL, MsSQLServer, Postgres ``sh /build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MySQLIntegrationSuite ./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MsSqlServerIntegrationSuite ./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.PostgresIntegrationSuite `` Closes #26531 from shivsood/pr_26499_2.4_port. Lead-authored-by: shivsood <shivsood@microsoft.com> Co-authored-by: Thomas D'Silva <tdsilva@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

gatorsmile

I have four major comments about this PR:

The changes made in this PR introduced the external changes. I believe we should document it in the migration guide.
Also, TINYINT => IntegerType is wrong? Mapping to ByteType looks strange.
More importantly, introducing such a behavior change in our maintenance releases should be avoided.
Any such a change should check the boundary values for each data type for each dialect; otherwise, it is easy to introduce the regression. Obviously, the test case coverage is not good now.

cc @shivsood @dongjoon-hyun @maropu @srowen

gatorsmile

I have four major comments about this PR:

The changes made in this PR introduced the external changes. I believe we should document it in the migration guide.
Also, TINYINT => IntegerType is wrong? Mapping to ByteType looks strange.
More importantly, introducing such a behavior change in our maintenance releases should be avoided.
Any such a change should check the boundary values for each data type for each dialect; otherwise, it is easy to introduce the regression. Obviously, the test case coverage is not good now.

cc @shivsood @dongjoon-hyun @maropu @srowen

HyukjinKwon · 2019-11-18T05:12:03Z

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala

@@ -59,7 +59,7 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite {
      """
        |INSERT INTO numbers VALUES (
        |0,
-        |255, 32767, 2147483647, 9223372036854775807,
+        |127, 32767, 2147483647, 9223372036854775807,


Ditto - #26549 (comment). Can anyone explain why it was possible, and how do we handle unsigned cases?

Continuing from #26549 (comment) : TINYINT looks like it's a single byte alright, so using ByteType is reasonable. However it looks like it's treated as signed by some but not all DBMSes. Is it unsigned in SQL Server?

Just checking: these types like TINYINT and SMALLINT are not standard types, although widely supported, right? should these types be used by default for all JDBC sources?

Yeah I have some more doubts now that the TINYINT issue was pointed out. @shivsood

Got it. @gatorsmile , @HyukjinKwon , @srowen . I overlooked that mismatch between TINYINT vs Byte in this PR.

SMALLINT is widely supported because it's SQL-92. For TINYINT, I agree that we need to revert partially for that type.

Thanks all for pointing out these issues. I had overlooked handling of unsigned cases and the fact that each database may define on its own. I think the problem exists for both SMALLINT and TINYINT.

TINYINT is very clear that it can be range any where from -127 to +127 and unsigned as 0 to 255. Both SQLServer and MySQL

SMALLINT can take an unsigned value 0 to 65535 per MySQL

My understanding is that there are no unsigned type in Spark. c.f. https://spark.apache.org/docs/latest/sql-reference.html. Is that assertion right?

Do we have a test for an integer where integer value is 4294967295? Per MySQL documentation that's possible that an unsigned integer will have that value.

The behavior would be as follows.

Overwrite scenario

(ShortType -> Int) If a spark df has a ShortType, on overwrite a DBMSS table with type Int will get created. Because Spark ShortyType is -32768 to +32767 only these values can be written.

(ByteType -> SmallInt) In a spark df has a ByteType, on overwrite DBMSS table with type Short will get created. Because Spark ShortyType is -128 to +127 only these values can be written.

Read scenario

If an existing table in DBMSS has type TinyInt, a read in Spark would results in a ShortType. Because ShortType range in Spark in -32786 to +32768, DBMSS signed value -127 to +127 and unsigned range of 0 to 255 will be handled.

If an existing table in DBMSS has type SmallInt, a read would result in Spark dataframe having a column type as Integer. Because Integer range in Spark in -2147483648 to +2147483648, DBMSS signed value --32768 to +32768 as well as unsigned range of 0 to 65535 will be handled.

So we have to use a widening conversion both ways. That's the safest thing to do, I guess, and less of a change from the current behavior, where bytes go all the way to ints.

Ur, I see. right.

Anyway, thanks for the explanation, @shivsood

Raised PR #27172 with the proposed fix. ShortType is unchanged, only ByteType fix is modified to map to ShortType on the read path so enable support for 0 to 255 range.

@maropu @srowen @HyukjinKwon as FYI. Thanks

…o SmallInt and TinyInt in JDBCUtils This reverts commit f7e53865 i.e PR #26301 from master Closes #26583 from shivsood/revert_29644_master. Authored-by: shivsood <shivsood@microsoft.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Fixed ShortType wrongly set as Int in JDBCUtils.scala. Added test cas…

e13326d

…e write/read tables from dataframe with colms as shorttype

shivsood mentioned this pull request Oct 30, 2019

[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect #25344

Closed

maropu reviewed Oct 30, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Show resolved Hide resolved

dongjoon-hyun added the SQL label Oct 30, 2019

shivsood changed the title ~~[SPARK-29644] [SQL] Corrected ShortType wrongly set as Int in JDBCUtils.scala.~~ [WIP] [SPARK-29644] [SQL] Corrected ShortType wrongly set as Int in JDBCUtils.scala. Oct 30, 2019

fix for ShortType and ByteType in JDBCUtils.scala, Unit test cases in…

395fe99

… JDBCWriterSuite and E2E test case in dockers integration suite

shivsood changed the title ~~[WIP] [SPARK-29644] [SQL] Corrected ShortType wrongly set as Int in JDBCUtils.scala.~~ [WIP] [SPARK-29644] [SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils Nov 1, 2019

test fixes

66dfd4c

srowen reviewed Nov 2, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala Show resolved Hide resolved

maropu reviewed Nov 2, 2019

View reviewed changes

fixed test failure in JDBCSuite.scala to get tinyint and smallint as …

9cdad2b

…getByte and getShort instead of getInt. Added PR number to test title as suggested

shivsood changed the title ~~[WIP] [SPARK-29644] [SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils~~ [SPARK-29644] [SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils Nov 12, 2019

maropu reviewed Nov 12, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Show resolved Hide resolved

maropu reviewed Nov 12, 2019

View reviewed changes

...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala Show resolved Hide resolved

shivsood force-pushed the shorttype_fix_maropu branch from 2ac5829 to 9cdad2b Compare November 13, 2019 22:04

removed redundant test cases.

0600894

maropu approved these changes Nov 13, 2019

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-29644] [SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils~~ [SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils Nov 14, 2019

dongjoon-hyun approved these changes Nov 14, 2019

View reviewed changes

dongjoon-hyun closed this in 32d44b1 Nov 14, 2019

shivsood mentioned this pull request Nov 14, 2019

[SPARK-26499][SQL][2.4] JdbcUtils.makeGetter does not handle ByteType #26531

Closed

shivsood mentioned this pull request Nov 15, 2019

[SPARK-29644][SQL][2.4] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26549

Closed

gatorsmile reviewed Nov 18, 2019

View reviewed changes

HyukjinKwon reviewed Nov 18, 2019

View reviewed changes

shivsood mentioned this pull request Nov 18, 2019

Revert "[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26583

Closed

shivsood mentioned this pull request Jan 10, 2020

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

Closed

[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301

[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301

Uh oh!

Conversation

shivsood commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Problems

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Nov 2, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shivsood commented Nov 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Nov 8, 2019

Uh oh!

shivsood commented Nov 8, 2019

Uh oh!

maropu commented Nov 8, 2019

Uh oh!

SparkQA commented Nov 11, 2019

Uh oh!

SparkQA commented Nov 11, 2019

Uh oh!

SparkQA commented Nov 11, 2019

Uh oh!

maropu commented Nov 11, 2019

Uh oh!

shivsood commented Nov 12, 2019

Uh oh!

Uh oh!

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 14, 2019

Uh oh!

dongjoon-hyun commented Nov 14, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

shivsood commented Nov 14, 2019

Uh oh!

SparkQA commented Nov 14, 2019

Uh oh!

dongjoon-hyun commented Nov 14, 2019

Uh oh!

gatorsmile left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shivsood Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

shivsood commented Oct 30, 2019 •

edited

Loading

shivsood commented Nov 5, 2019 •

edited

Loading

gatorsmile left a comment •

edited

Loading

gatorsmile left a comment •

edited

Loading

shivsood Nov 19, 2019 •

edited

Loading

shivsood Jan 10, 2020 •

edited

Loading