-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-20338][CORE]Spaces in spark.eventLog.dir are not correctly handled #17638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ | |
package org.apache.spark.scheduler | ||
|
||
import java.io._ | ||
import java.net.URI | ||
import java.nio.charset.StandardCharsets | ||
import java.util.Locale | ||
|
||
|
@@ -50,22 +49,22 @@ import org.apache.spark.util.{JsonProtocol, Utils} | |
private[spark] class EventLoggingListener( | ||
appId: String, | ||
appAttemptId : Option[String], | ||
logBaseDir: URI, | ||
logBaseDir: String, | ||
sparkConf: SparkConf, | ||
hadoopConf: Configuration) | ||
extends SparkListener with Logging { | ||
|
||
import EventLoggingListener._ | ||
|
||
def this(appId: String, appAttemptId : Option[String], logBaseDir: URI, sparkConf: SparkConf) = | ||
def this(appId: String, appAttemptId : Option[String], logBaseDir: String, sparkConf: SparkConf) = | ||
this(appId, appAttemptId, logBaseDir, sparkConf, | ||
SparkHadoopUtil.get.newConfiguration(sparkConf)) | ||
|
||
private val shouldCompress = sparkConf.getBoolean("spark.eventLog.compress", false) | ||
private val shouldOverwrite = sparkConf.getBoolean("spark.eventLog.overwrite", false) | ||
private val testing = sparkConf.getBoolean("spark.eventLog.testing", false) | ||
private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024 | ||
private val fileSystem = Utils.getHadoopFileSystem(logBaseDir, hadoopConf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This probably is a by design choice, only when scheme is defined then Spark will pick the right FS, otherwise it will use local FS instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about the URI of "hdfs://nn:9000/a b/c" ? Even there is right scheme of FS but it will use local FS instead There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure? Let me investigate a bit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes i have tested. In resolveURI function if path contains space, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So i think we should not use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't agree with you. String or URI representation should be equal, it is not that changing to String representation then the issue is workaround-ed. I think in your case we need to fix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, i will try to fix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, from your case it is workable, but I'm sure if it could handle all the cases in UT. |
||
private val fileSystem = new Path(logBaseDir).getFileSystem(hadoopConf) | ||
private val compressionCodec = | ||
if (shouldCompress) { | ||
Some(CompressionCodec.createCodec(sparkConf)) | ||
|
@@ -96,8 +95,8 @@ private[spark] class EventLoggingListener( | |
} | ||
|
||
val workingPath = logPath + IN_PROGRESS | ||
val uri = new URI(workingPath) | ||
val path = new Path(workingPath) | ||
val uri = path.toUri | ||
val defaultFs = FileSystem.getDefaultUri(hadoopConf).getScheme | ||
val isDefaultLocal = defaultFs == null || defaultFs == "file" | ||
|
||
|
@@ -303,11 +302,11 @@ private[spark] object EventLoggingListener extends Logging { | |
* @return A path which consists of file-system-safe characters. | ||
*/ | ||
def getLogPath( | ||
logBaseDir: URI, | ||
logBaseDir: String, | ||
appId: String, | ||
appAttemptId: Option[String], | ||
compressionCodecName: Option[String] = None): String = { | ||
val base = logBaseDir.toString.stripSuffix("/") + "/" + sanitize(appId) | ||
val base = logBaseDir.stripSuffix("/") + "/" + sanitize(appId) | ||
val codec = compressionCodecName.map("." + _).getOrElse("") | ||
if (appAttemptId.isDefined) { | ||
base + "_" + sanitize(appAttemptId.get) + codec | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you URI encode this
unresolvedDir
here? I think encode space to "%20" should be enough. Not sure why you need to change lots of code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like space is acceptable for
resolveURI
(https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala#L479).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dir contains space and also contains %20 (e.g "hdfs://nn:9000/a b%20c"), i seems to me that the encode does not work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in your case we need to percentile encode the
unresolvedDir
before callingresolveURI
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suggest to use
new Path(path).toURI()
insteadnew URI(path)
sincenew URI(path)
not support space in path.It is not necessary to use encode if we use
new Path(path).toURI()