-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect #25344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…erverDialect.scala. The fix fails with the following error when df.write is done with a dataframe what contains a ByteType. 19/08/02 18:25:44 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 1197 bytes result sent to driver 19/08/02 18:25:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 7) in 43 ms on localhost (executor driver) (1/2) 19/08/02 18:25:44 INFO CodeGenerator: Code generated in 14.586963 ms 19/08/02 18:25:44 ERROR Executor: Exception in task 1.0 in stage 7.0 (TID 8) java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.Integer is not a valid external type for schema of tinyint if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, serialNum), ByteType) AS serialNum#231 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:344) at org.apache.spark.sql.SparkSession.$anonfun$createDataFrame$1(SparkSession.scala:367) at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:662) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:845)
@dongjoon-hyun @srowen This is PR for ByteType fix. I am seeing a catalyst exception post my fix. I will need your help/guidance on resolving this. I see that i fails in ValidateExternalType() with mismatched type , but i am not able to get to exact flow of events here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this supersede one of your earlier PRs?
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
Hi, @shivsood . So, do you mean this PR fails at UT in your local environment? |
ok to test |
Test build #108642 has finished for PR 25344 at commit
|
@dongjoon-hyun No. I have added a new E2E test case where i create a table with BYTETYPE. That fails. I was aware of that failure and that why the fix. Post fix its failing at the next level ( refer to How was patch tested section of the PR). |
@dongjoon-hyun Sorry about these. I would fix. Have to check why these are not coming in my build. For me the source code build gives scala style warning, but test code does not seem to give me warnings. |
…yte and not java.lang.interger. Fixed styling issues noted in the review comments
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the scala style issues first which @srowen already com
Done. I realize dev/stylecheck should be run manually for test cases. I presumed its get run automatically when i build. Apparently not for test code.
Test build #108681 has finished for PR 25344 at commit
|
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
Show resolved
Hide resolved
Thank you for updating, @shivsood ! 😄 |
Test build #112079 has finished for PR 25344 at commit
|
@@ -21,6 +21,8 @@ import java.math.BigDecimal | |||
import java.sql.{Connection, Date, Timestamp} | |||
import java.util.Properties | |||
|
|||
import org.apache.spark.sql.{DataFrame, Row} | |||
import org.apache.spark.sql.types._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this. I'll give the updated short test code.
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the test code according to the comment.
(And, I revised the PR description according to the latest template).
@shivsood ping, are you there? |
@maropu Yes, Would handle this this week. Please note that ShortType issue is fixed SPARK-28152. Or are you pointing to a different issue? |
That should be like this? https://github.com/apache/spark/pull/25344/files#diff-391379a5ec51082e2ae1209db15c02b3R549
=>
? |
Ping @shivsood |
@maropu Created JIRA https://issues.apache.org/jira/browse/SPARK-29644 for this issue. Will send a PR with the resolution and test. |
Test build #112872 has finished for PR 25344 at commit
|
@shivsood @maropu so that I can keep track of this -- it seems like this is a combination of SPARK-29644 and SPARK-28152, but for bytes, right? The question there was testing, whether the test should be more general, because the fix is general. Same here, how much of this is specific to SQL Server? the setByte part isn't, it seems. And the question there too was whether a deeper fix is needed? does that need to be reflected here? I'm pretty OK to fix this isolated issue right now if it by itself solves a problem. |
@shivsood Can you solve the "SQL server-specific" issue of |
I am putting this PR back into WIP for now. |
Ah, I see. If so, how about fixing the more general issue in JDBCUtils, first? Then, fixing the "SQL server" issue. |
That would be right approach. It would be quite a test effort. I have to get familiar with unit test and E2E test for all other servers. Will start some work on that soon. |
Thanks, @shivsood ! |
Thanks a lot for your comments every one. |
Is this superseded by #26301 ? |
Closing as this is fixed by PR #26301 which is now accepted to master. |
What changes were proposed in this pull request?
Change mapping of BYTETYPE from BYTE to TinyINT.
Why are the changes needed?
Writing dataframe with column type BYTETYPE fails when using JDBC connector for SQL Server. The problem is due to
case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))
In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from the point of view of upcasting, but will lead to 4 byte allocation rather than 1 byte for BYTETYPE.
Note : These issues were found when reading/writing with SQLServer.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Path was tested with integration test by adding test case to MsSqlServerIntegrationSuite.scala. Test case passed on local tests.