Skip to content

Commit 89bad26

Browse files
MaxGekkHyukjinKwon
authored andcommitted
[SPARK-29200][SQL] Optimize extract/date_part for epoch
### What changes were proposed in this pull request? Refactoring of the `DateTimeUtils.getEpoch()` function by avoiding decimal operations that are pretty expensive, and converting the final result to the decimal type at the end. ### Why are the changes needed? The changes improve performance of the `getEpoch()` method at least up to **20 times**. Before: ``` Invoke extract for timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ cast to timestamp 256 277 33 39.0 25.6 1.0X EPOCH of timestamp 23455 23550 131 0.4 2345.5 0.0X ``` After: ``` Invoke extract for timestamp: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ cast to timestamp 255 294 34 39.2 25.5 1.0X EPOCH of timestamp 1049 1054 9 9.5 104.9 0.2X ``` ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test from `DateExpressionSuite`. Closes #25881 from MaxGekk/optimize-extract-epoch. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
1 parent 3be5741 commit 89bad26

File tree

2 files changed

+92
-91
lines changed

2 files changed

+92
-91
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -847,9 +847,10 @@ object DateTimeUtils {
847847
* since 1970-01-01 00:00:00 local time.
848848
*/
849849
def getEpoch(timestamp: SQLTimestamp, zoneId: ZoneId): Decimal = {
850-
val offset = zoneId.getRules.getOffset(microsToInstant(timestamp)).getTotalSeconds
851-
val sinceEpoch = BigDecimal(timestamp) / MICROS_PER_SECOND + offset
852-
new Decimal().set(sinceEpoch, 20, 6)
850+
val offset = SECONDS.toMicros(
851+
zoneId.getRules.getOffset(microsToInstant(timestamp)).getTotalSeconds)
852+
val sinceEpoch = timestamp + offset
853+
Decimal(sinceEpoch, 20, 6)
853854
}
854855

855856
def currentTimestamp(): SQLTimestamp = instantToMicros(Instant.now())

0 commit comments

Comments
 (0)