-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8307] [SQL] improve timestamp from parquet #6759
Conversation
@@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.expressions.Cast | |||
/** | |||
* Helper function to convert between Int value of days since 1970-01-01 and java.sql.Date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java doc here
Test build #34675 has finished for PR 6759 at commit
|
|
||
def convertToTimestamp(value: Binary): Timestamp = { | ||
def convertToTimestamp(value: Binary): Long = { | ||
val nt = NanoTime.fromBinary(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just get rid of NanoTime ...
Test build #34724 has finished for PR 6759 at commit
|
Test build #34726 has finished for PR 6759 at commit
|
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala
Test build #34728 has finished for PR 6759 at commit
|
Test build #898 has finished for PR 6759 at commit
|
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala
Test build #34841 has finished for PR 6759 at commit
|
Test build #34842 has finished for PR 6759 at commit
|
Test build #34849 has finished for PR 6759 at commit
|
Test build #34858 has finished for PR 6759 at commit
|
Test build #911 has finished for PR 6759 at commit
|
Test build #912 has finished for PR 6759 at commit
|
Test build #913 has finished for PR 6759 at commit
|
Test build #34899 has finished for PR 6759 at commit
|
@rxin This is ready to review |
* and nanoseconds in a day | ||
*/ | ||
def fromJulianDay(day: Int, nanoseconds: Long): Long = { | ||
// use integer to avoid rounding errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're using longs here to avoid the rounding errors, not integers, so maybe we should update this comment.
Test build #35296 has finished for PR 6759 at commit
|
DateTimeUtils.fromJulianDay(julianDay, timeOfDayNanos) | ||
} | ||
|
||
def convertFromTimestamp(ts: Long): Binary = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I think we should add a comment here as well (it can be the inverse of my other suggested Scaladoc comment).
At a high level, this looks like it's in really good shape. I might have more comments in a bit; have to step out to handle some calls now, but I'll loop back later in the afternoon to finish looking this over. |
Test build #35321 has finished for PR 6759 at commit
|
@@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends WriteSupport[InternalRow] with Lo | |||
writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes)) | |||
} | |||
|
|||
// array used to write Timestamp as Int96 (fixed-length binary) | |||
private val int96buf = new Array[Byte](12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that RowWriteSupport
instances are thread-safe; just want to check that that assumption is true.
This could also be private[this]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I guess this is fine since we use the same pattern with scratchBytes
for writeDecimal
in the preceding lines.
Test build #35338 has finished for PR 6759 at commit
|
Test build #35332 has finished for PR 6759 at commit
|
Test build #941 has finished for PR 6759 at commit
|
Test build #942 has finished for PR 6759 at commit
|
Test build #945 has finished for PR 6759 at commit
|
Test build #948 has finished for PR 6759 at commit
|
Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Test build #35481 has finished for PR 6759 at commit
|
Conflicts: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala
@JoshRosen Is this ready to go? |
@davies, yeah, I think this should be good to go; looks like all of my comments have been addressed. |
Test build #35489 has finished for PR 6759 at commit
|
The last failure is not related, I'd merge this into master. |
This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp). cc adrian-wang rxin Author: Davies Liu <davies@databricks.com> Closes apache#6759 from davies/improve_ts and squashes the following commits: 849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 8e2d56f [Davies Liu] address comments 634b9f5 [Davies Liu] fix mima 4891efb [Davies Liu] address comment bfc437c [Davies Liu] fix build ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 602b969 [Davies Liu] remove jodd 2f2e48c [Davies Liu] fix test 8ace611 [Davies Liu] fix mima 212143b [Davies Liu] fix mina c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 5233974 [Davies Liu] fix scala style 361fd62 [Davies Liu] address comments ea196d4 [Davies Liu] improve timestamp from parquet
This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp).
cc @adrian-wang @rxin