Skip to content

Commit db996cc

Browse files
MaxGekkHyukjinKwon
authored andcommitted
[SPARK-29074][SQL] Optimize date_format for foldable fmt
### What changes were proposed in this pull request? In the PR, I propose to create an instance of `TimestampFormatter` only once at the initialization, and reuse it inside of `nullSafeEval()` and `doGenCode()` in the case when the `fmt` parameter is foldable. ### Why are the changes needed? The changes improve performance of the `date_format()` function. Before: ``` format date: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ format date wholestage off 7180 / 7181 1.4 718.0 1.0X format date wholestage on 7051 / 7194 1.4 705.1 1.0X ``` After: ``` format date: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ format date wholestage off 4787 / 4839 2.1 478.7 1.0X format date wholestage on 4736 / 4802 2.1 473.6 1.0X ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? By existing test suites `DateExpressionsSuite` and `DateFunctionsSuite`. Closes #25782 from MaxGekk/date_format-foldable. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
1 parent c862835 commit db996cc

File tree

2 files changed

+26
-10
lines changed

2 files changed

+26
-10
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -592,19 +592,35 @@ case class DateFormatClass(left: Expression, right: Expression, timeZoneId: Opti
592592
override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
593593
copy(timeZoneId = Option(timeZoneId))
594594

595+
@transient private lazy val formatter: Option[TimestampFormatter] = {
596+
if (right.foldable) {
597+
Option(right.eval()).map(format => TimestampFormatter(format.toString, zoneId))
598+
} else None
599+
}
600+
595601
override protected def nullSafeEval(timestamp: Any, format: Any): Any = {
596-
val df = TimestampFormatter(format.toString, zoneId)
597-
UTF8String.fromString(df.format(timestamp.asInstanceOf[Long]))
602+
val tf = if (formatter.isEmpty) {
603+
TimestampFormatter(format.toString, zoneId)
604+
} else {
605+
formatter.get
606+
}
607+
UTF8String.fromString(tf.format(timestamp.asInstanceOf[Long]))
598608
}
599609

600610
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
601-
val tf = TimestampFormatter.getClass.getName.stripSuffix("$")
602-
val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
603-
val locale = ctx.addReferenceObj("locale", Locale.US)
604-
defineCodeGen(ctx, ev, (timestamp, format) => {
605-
s"""UTF8String.fromString($tf$$.MODULE$$.apply($format.toString(), $zid, $locale)
611+
formatter.map { tf =>
612+
val timestampFormatter = ctx.addReferenceObj("timestampFormatter", tf)
613+
defineCodeGen(ctx, ev, (timestamp, _) => {
614+
s"""UTF8String.fromString($timestampFormatter.format($timestamp))"""
615+
})
616+
}.getOrElse {
617+
val tf = TimestampFormatter.getClass.getName.stripSuffix("$")
618+
val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
619+
defineCodeGen(ctx, ev, (timestamp, format) => {
620+
s"""UTF8String.fromString($tf$$.MODULE$$.apply($format.toString(), $zid)
606621
.format($timestamp))"""
607-
})
622+
})
623+
}
608624
}
609625

610626
override def prettyName: String = "date_format"

sql/core/benchmarks/DateTimeBenchmark-results.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.3
168168
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
169169
format date: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
170170
------------------------------------------------------------------------------------------------
171-
format date wholestage off 7180 / 7181 1.4 718.0 1.0X
172-
format date wholestage on 7051 / 7194 1.4 705.1 1.0X
171+
format date wholestage off 4787 / 4839 2.1 478.7 1.0X
172+
format date wholestage on 4736 / 4802 2.1 473.6 1.0X
173173

174174

175175
================================================================================================

0 commit comments

Comments
 (0)