Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI #15971

Closed
wants to merge 8 commits into from
Closed
Prev Previous commit
Next Next commit
More feedback
  • Loading branch information
markgrover committed Nov 23, 2016
commit 61a961cc9ac34d0f87afba2e2a72b6922272aaaf
Original file line number Diff line number Diff line change
Expand Up @@ -233,12 +233,10 @@ private[spark] class EventLoggingListener(
}
}

private[spark] def redactEvent(event: SparkListenerEnvironmentUpdate):
SparkListenerEnvironmentUpdate = {
private[spark] def redactEvent(
event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this needs to be indented one more level...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll go through Spark style guide so I don't cause as much trouble next time.

Thanks for reviewing.

// "Spark Properties" entry will always exist because the map is always populated with it.
val props = event
.environmentDetails("Spark Properties")
val redactedProps = Utils.redact(sparkConf, props)
val redactedProps = Utils.redact(sparkConf, event.environmentDetails("Spark Properties"))
val redactedEnvironmentDetails = event.environmentDetails +
("Spark Properties" -> redactedProps)
SparkListenerEnvironmentUpdate(redactedEnvironmentDetails)
Expand Down
4 changes: 2 additions & 2 deletions core/src/main/scala/org/apache/spark/util/Utils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2562,9 +2562,9 @@ private[spark] object Utils extends Logging {
val redactionPattern = conf.get(SECRET_REDACTION_PATTERN).r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very expensive. How about a version that takes a list of tuples and redacts them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What part do you think is expensive? Going through all the configuration properties and matching them with the regex?
If so, I agree. However, that has to be done somewhere. All the callers of this function have a SparkConf that they want stuff redacted from. So, if this function accepts a list of tuples, they have to run the regex check to find that list first before sending it to redact(). So, overall, unless I am missing something, I don't think we can avoid the expense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling the regex once for every item in the list being redacted, instead of doing it once for the whole list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Let me fix this.

kvs.map { kv =>
redactionPattern.findFirstIn(kv._1)
.map{ ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
.map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) }
.getOrElse(kv)
}
}
}

}
Expand Down
4 changes: 2 additions & 2 deletions core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -991,7 +991,7 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
val redactedConf = Utils.redact(sparkConf, sparkConf.getAll).toMap

// Assert that secret information got redacted while the regular property remained the same
secretKeys.foreach { key => assert(redactedConf(key) == Utils.REDACTION_REPLACEMENT_TEXT) }
assert(redactedConf("spark.regular.property") == "not_a_secret")
secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
assert(redactedConf("spark.regular.property") === "not_a_secret")
}
}