Skip to content

[WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read #27172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shivsood
Copy link
Contributor

What changes were proposed in this pull request?

Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read path.
Fixed Unit test cases where applicable and added new E2E test cases in to test table read/write ByteType.

Refer #26301 for details. The fix was reverted as it mapped even the read path to a ByteType and that would have resulted in truncations for some database that support 0 to 255 for tinyint while spark byteType range is -127 to +127 only.

Problems

  • In master in JDBCUtils.scala line number 547 and 551 have a problem where ByteType are set as Integers.
case ByteType =>
(stmt: PreparedStatement, row: Row, pos: Int) =>
stmt.setInt(pos + 1, row.getByte(pos))
  • Also at line JDBCUtils.scala 247 TinyInt is interpreted wrongly as IntergetType in getCatalystType()

case java.sql.Types.TINYINT => IntegerType

Why are the changes needed?

With the current mapping of ByteType as "Byte" when writting a table from JDBC connector fails.

Does this PR introduce any user-facing change?

Yes
(Write path) Uses would now be able to create tables when data frame has ByteType.
(Read path) Spark dataframe that reads a table with a TINYINT will get mapped to ShortType rather than Integer

How was this patch tested?

Corrected Unit test cases where applicable.
Added a test case in MsSqlServerIntegrationSuite.scala. Validated by manual as follows.

./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.12

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: is this going to have the same problem with breaking changes as in the previous PR? what is the theory about compatibility? not that we must retain compatibility in 3.0, just bears being clear about the change.

@shivsood
Copy link
Contributor Author

what is the theory about compatibility?
Yes this will introduce a diffirent behaviour.
The write path was broken for ByteType, so that's a positive change in behaviour. This fixes it.
On the other side, reading a SQL table with a TINYINT will get create column as ShortType rather than Integer. This is a not compatibility change.

Should the feature be under a build time flag?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 11, 2020

Hi, @shivsood .
Instead of this, we need a follow-up PR for #25248 for 2.4.5 release.

[SPARK-28152][SQL][2.4] Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect

The flag is requested at #25248 (comment) (Dec 2, 2019).

@shivsood
Copy link
Contributor Author

Dec 2, 2019
Yes on my backlog now that i am back :-)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

github-actions bot commented Jun 5, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jun 5, 2020
@github-actions github-actions bot closed this Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants