[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

shivsood · 2020-01-10T23:41:58Z

What changes were proposed in this pull request?

Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read path.
Fixed Unit test cases where applicable and added new E2E test cases in to test table read/write ByteType.

Refer #26301 for details. The fix was reverted as it mapped even the read path to a ByteType and that would have resulted in truncations for some database that support 0 to 255 for tinyint while spark byteType range is -127 to +127 only.

Problems

In master in JDBCUtils.scala line number 547 and 551 have a problem where ByteType are set as Integers.

case ByteType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getByte(pos))

Also at line JDBCUtils.scala 247 TinyInt is interpreted wrongly as IntergetType in getCatalystType()

case java.sql.Types.TINYINT => IntegerType

Why are the changes needed?

With the current mapping of ByteType as "Byte" when writting a table from JDBC connector fails.

Does this PR introduce any user-facing change?

Yes
(Write path) Uses would now be able to create tables when data frame has ByteType.
(Read path) Spark dataframe that reads a table with a TINYINT will get mapped to ShortType rather than Integer

How was this patch tested?

Corrected Unit test cases where applicable.
Added a test case in MsSqlServerIntegrationSuite.scala. Validated by manual as follows.

./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12

of fix ByteType is now mapped to ShortType

srowen

General question: is this going to have the same problem with breaking changes as in the previous PR? what is the theory about compatibility? not that we must retain compatibility in 3.0, just bears being clear about the change.

shivsood · 2020-01-11T00:20:57Z

what is the theory about compatibility?
Yes this will introduce a diffirent behaviour.
The write path was broken for ByteType, so that's a positive change in behaviour. This fixes it.
On the other side, reading a SQL table with a TINYINT will get create column as ShortType rather than Integer. This is a not compatibility change.

Should the feature be under a build time flag?

dongjoon-hyun · 2020-01-11T00:37:50Z

Hi, @shivsood .
Instead of this, we need a follow-up PR for #25248 for 2.4.5 release.

[SPARK-28152][SQL][2.4] Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect

The flag is requested at #25248 (comment) (Dec 2, 2019).

shivsood · 2020-01-11T00:48:40Z

Dec 2, 2019
Yes on my backlog now that i am back :-)

AmplabJenkins · 2020-02-25T02:24:01Z

Can one of the admins verify this patch?

github-actions · 2020-06-05T00:22:16Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Fix to make ByteType functional for write of tables using JDBC. As part

5ee7d5e

of fix ByteType is now mapped to ShortType

shivsood mentioned this pull request Jan 10, 2020

[SPARK-29644][SQL] Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils #26301

Closed

srowen reviewed Jan 11, 2020

View reviewed changes

dongjoon-hyun added the SQL label Feb 5, 2020

github-actions bot added the Stale label Jun 5, 2020

github-actions bot closed this Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

Uh oh!

shivsood commented Jan 10, 2020

Uh oh!

srowen left a comment

Uh oh!

shivsood commented Jan 11, 2020

Uh oh!

dongjoon-hyun commented Jan 11, 2020 •

edited

Loading

Uh oh!

shivsood commented Jan 11, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

github-actions bot commented Jun 5, 2020

Uh oh!

Uh oh!

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

Uh oh!

Conversation

shivsood commented Jan 10, 2020

What changes were proposed in this pull request?

Problems

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

shivsood commented Jan 11, 2020

Uh oh!

dongjoon-hyun commented Jan 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shivsood commented Jan 11, 2020

Uh oh!

AmplabJenkins commented Feb 25, 2020

Uh oh!

github-actions bot commented Jun 5, 2020

Uh oh!

Uh oh!

dongjoon-hyun commented Jan 11, 2020 •

edited

Loading