Skip to content

Commit

Permalink
[SPARK-48114][CORE] Precompile template regex to avoid unnecessary work
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Error message template regex is now precompiled to avoid unnecessary work

### Why are the changes needed?
`SparkRuntimeException` uses `SparkThrowableHelper`, which uses `ErrorClassesJsonReader` to create error message string from templates in `error-conditions.json`, but template regex is compiled on every `SparkRuntimeException` constructor invocation. This slows down error construction, in particular `UnivocityParser` + `FailureSafeParser`, where it's a hot path.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- `testOnly org.apache.spark.sql.errors.QueryExecutionErrorsSuite`
- Manually checked csv parsing error

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46365 from vladimirg-db/vladimirg-db/precompile-regexes-in-error-classes-json-reader.

Authored-by: Vladimir Golubev <vladimir.golubev@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
vladimirg-db authored and dongjoon-hyun committed May 3, 2024
1 parent 8590288 commit b42d235
Showing 1 changed file with 5 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
sub.setEnableUndefinedVariableException(true)
sub.setDisableSubstitutionInValues(true)
try {
sub.replace(messageTemplate.replaceAll("<([a-zA-Z0-9_-]+)>", "\\$\\{$1\\}"))
sub.replace(ErrorClassesJsonReader.TEMPLATE_REGEX.replaceAllIn(
messageTemplate, "\\$\\{$1\\}"))
} catch {
case _: IllegalArgumentException => throw SparkException.internalError(
s"Undefined error message parameter for error class: '$errorClass'. " +
Expand All @@ -59,8 +60,7 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {

def getMessageParameters(errorClass: String): Seq[String] = {
val messageTemplate = getMessageTemplate(errorClass)
val pattern = "<([a-zA-Z0-9_-]+)>".r
val matches = pattern.findAllIn(messageTemplate).toSeq
val matches = ErrorClassesJsonReader.TEMPLATE_REGEX.findAllIn(messageTemplate).toSeq
matches.map(m => m.stripSuffix(">").stripPrefix("<"))
}

Expand Down Expand Up @@ -106,6 +106,8 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
}

private object ErrorClassesJsonReader {
private val TEMPLATE_REGEX = "<([a-zA-Z0-9_-]+)>".r

private val mapper: JsonMapper = JsonMapper.builder()
.addModule(DefaultScalaModule)
.build()
Expand Down

0 comments on commit b42d235

Please sign in to comment.