Skip to content

[SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from … #9812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ import org.apache.spark.util.ParentClassLoader
/**
* A ClassLoader that reads classes from a Hadoop FileSystem or HTTP URI,
* used to load classes defined by the interpreter when the REPL is used.
* Allows the user to specify if user class path should be first
* Allows the user to specify if user class path should be first.
* This class loader delegates getting/finding resources to parent loader,
* which makes sense until REPL never provide resource dynamically.
*/
class ExecutorClassLoader(conf: SparkConf, classUri: String, parent: ClassLoader,
userClassPathFirst: Boolean) extends ClassLoader with Logging {
Expand All @@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri: String, parent: ClassLoader
}
}

override def getResource(name: String): URL = {
parentLoader.getResource(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk maybe you have some thoughts on this?
Also summoning @vanzin

Doesn't this need to check userClassPathFirst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
It doesn't need to check userClassPathFirst since this implementation implies that REPL never provides resources dynamically so there's no need to lookup resource from ExecutorClassLoader itself.

Btw, could precondition be broken? I couldn't imagine REPL generating resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be surprised if the repl generated resources, but at the same time, what if someone tries to load the generated class file as a resource? It's an unusual but valid use case.

The code to support that is not complicated; it already exists in MutableURLClassLoader.scala (class ChildFirstURLClassLoader for, example). If you're up for it you could even create a helper trait or a utility method somewhere for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin
I didn't see use case you mentioned but it could make sense.

In order to achieve, we have to implement findResource() and findResources() for ExecutorClassLoader since ExecutorClassLoader cannot rely on superclass (ClassLoader) to find resource.
It is easy to provide resource URL which points to origin scheme (http, https, ftp, hdfs), but since I'm new to classloader, so I'm wondering it is safe to return URL from findResource() and findResources() which doesn't point to local file.

If it is not safe to return non local file as URL, what's recommended way to do?
I can only think about downloading files to local temp directory per every call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering it is safe to return URL from findResource() and findResources() which doesn't point to local file

That should be perfectly fine. That's how URLClassLoader works, after all. The only potential odd thing would be getResourceAsStream, which returns an InputStream, and my guess is if the JDK's URL class doesn't support the protocol, would cause an error unless you overrode it here.

So perhaps it's too much to worry about and we can just assume that no one will do that, and fix it if someone ever needs that feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin
To clarify about "feature", do you want me to change implementation of findResource() and findResources() for pointing origin scheme, and forget about potential odd? Or forget about finding resources from REPL uri and leave as this PR is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the PR as is should be fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin
OK, Thanks for clarification! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all seams reasonable to me (sorry for the slow reply I'm in Tokyo right now).

}

override def getResources(name: String): java.util.Enumeration[URL] = {
parentLoader.getResources(name)
}

override def findClass(name: String): Class[_] = {
userClassPathFirst match {
case true => findClassLocally(name).getOrElse(parentLoader.loadClass(name))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,13 @@ package org.apache.spark.repl

import java.io.File
import java.net.{URL, URLClassLoader}
import java.nio.charset.StandardCharsets
import java.util

import com.google.common.io.Files

import scala.concurrent.duration._
import scala.io.Source
import scala.language.implicitConversions
import scala.language.postfixOps

Expand All @@ -41,6 +46,7 @@ class ExecutorClassLoaderSuite

val childClassNames = List("ReplFakeClass1", "ReplFakeClass2")
val parentClassNames = List("ReplFakeClass1", "ReplFakeClass2", "ReplFakeClass3")
val parentResourceNames = List("fake-resource.txt")
var tempDir1: File = _
var tempDir2: File = _
var url1: String = _
Expand All @@ -54,6 +60,9 @@ class ExecutorClassLoaderSuite
url1 = "file://" + tempDir1
urls2 = List(tempDir2.toURI.toURL).toArray
childClassNames.foreach(TestUtils.createCompiledClass(_, tempDir1, "1"))
parentResourceNames.foreach { x =>
Files.write("resource".getBytes(StandardCharsets.UTF_8), new File(tempDir2, x))
}
parentClassNames.foreach(TestUtils.createCompiledClass(_, tempDir2, "2"))
}

Expand Down Expand Up @@ -99,6 +108,26 @@ class ExecutorClassLoaderSuite
}
}

test("resource from parent") {
val parentLoader = new URLClassLoader(urls2, null)
val classLoader = new ExecutorClassLoader(new SparkConf(), url1, parentLoader, true)
val resourceName: String = parentResourceNames.head
val is = classLoader.getResourceAsStream(resourceName)
assert(is != null, s"Resource $resourceName not found")
val content = Source.fromInputStream(is, "UTF-8").getLines().next()
assert(content.contains("resource"), "File doesn't contain 'resource'")
}

test("resources from parent") {
val parentLoader = new URLClassLoader(urls2, null)
val classLoader = new ExecutorClassLoader(new SparkConf(), url1, parentLoader, true)
val resourceName: String = parentResourceNames.head
val resources: util.Enumeration[URL] = classLoader.getResources(resourceName)
assert(resources.hasMoreElements, s"Resource $resourceName not found")
val fileReader = Source.fromInputStream(resources.nextElement().openStream()).bufferedReader()
assert(fileReader.readLine().contains("resource"), "File doesn't contain 'resource'")
}

test("failing to fetch classes from HTTP server should not leak resources (SPARK-6209)") {
// This is a regression test for SPARK-6209, a bug where each failed attempt to load a class
// from the driver's class server would leak a HTTP connection, causing the class server's
Expand Down