-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-3762] clear reference of SparkEnv after stop #2624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 2624 at commit
|
QA tests have finished for PR 2624 at commit
|
Test PASSed. |
@@ -130,6 +133,12 @@ object SparkEnv extends Logging { | |||
env.set(e) | |||
} | |||
|
|||
// clear all the threadlocal references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about threads that have called SparkEnv.set(SparkEnv.get)
? I don't think those ThreadLocals will get cleared.
It looks like SparkEnv has two "getter" methods: It turns out that there's no code which calls |
@JoshRosen I'd like to move in this way, but getThreadLocal() is a public API, I'm afraid to remove it. In the past, I remembered that we can have two different SparkEnv in the same JVM, a different SparkEnv for executor to hold different TrackerClients. Maybe it's not needed anymore. Also we can keep the getThreadLocal() and deprecate it, remove the ThreadLocal object, how is it? |
@davies The SparkEnv class is marked
Nearly every object that's accessible through SparkEnv's public fields is |
In Executor, we only create a new SparkContext if we're not running in local mode: // Initialize Spark environment (using system properties read above)
private val env = {
if (!isLocal) {
val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
isDriver = false, isLocal = false)
SparkEnv.set(_env)
_env.metricsSystem.registerSource(executorSource)
_env
} else {
SparkEnv.get
}
} Since we only have one |
QA tests have started for PR 2624 at commit
|
We should probably update the docstrings to remove all references to ThreadLocal, too. |
QA tests have finished for PR 2624 at commit
|
Test FAILed. |
Hmm, strange test failure:
This is a known flaky test, though, so let's try running the tests again. Jenkins, retest this please. |
QA tests have started for PR 2624 at commit
|
QA tests have finished for PR 2624 at commit
|
Test PASSed. |
Okay, this is a pretty significant change to remove the threadlocal object completely. There are two things we can do
I think we need more input from others @pwendell @mateiz @rxin |
I'm not sure that there's an easy, minimal approach that's also correct, though. The problem is that some threads obtain a SparkEnv by calling |
It's worth noting that the ThreadLocals haven't seemed to cause problems in any of the existing uses of Spark / PySpark. In PySpark Streaming, I think we're running into a scenario that's something like this:
I thought of another fix that will allow the ThreadLocals to work: add a mutable field to SparkEnv instances that records whether that environment is associated with a SparkContext that's been stopped. In SparkEnv.get(), we can check this field to determine whether to return the ThreadLocal or return lastSparkEnv. This approach is more confusing / complex than removing the ThreadLocals, though. I'm still strongly in favor of doing the work to confirm that SparkEnv is currently used as though it's a global object and then removing the ThreadLocals. |
@JoshRosen the simple fix is to delete the threadlocal variable completely. Then any access to the threadlocal variable from any thread (even threadpool in Py4J) is going to be reset. What you said is definitely the structured solution and I am not opposed to it. However, that is a bigger endeavor and i dont want the pyspark streaming patched to be blocked for that. The simple intermediate fix is safer and correct (though, not the ideal, I agree) as it does not change any code paths that can be accessed while a sparkcontext is active. |
/** | ||
* Returns the ThreadLocal SparkEnv. | ||
*/ | ||
def getThreadLocal: SparkEnv = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should leave this in instead of removing it, because some user code might be calling this public method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Even though it's a DeveloperApi we shouldn't break it if we can help it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let's throw @deprecated
on it when we put it back, though.
@mateiz Aside from restoring the |
@mateiz @JoshRosen I had put getThreadLocal() back and deprecated it. |
QA tests have started for PR 2624 at commit
|
QA tests have finished for PR 2624 at commit
|
Test PASSed. |
Yup, looks good! I'm going to merge it. |
This commit removes the SparkEnv thread local variable, and instead makes SparkEnv a single global variable (which is how SparkEnv has always been used). Ths significantly simplifies how SparkEnv needs to be passed around for monotasks. This is a cherry-picked version of Apache Spark commit 6550329 to simplify handling of SparkEnv. Author: Davies Liu <davies.liu@gmail.com> Closes apache#2624 from davies/env and squashes the following commits: a69f30c [Davies Liu] deprecate getThreadLocal ba77ca4 [Davies Liu] remove getThreadLocal(), update docs ee62bb7 [Davies Liu] cleanup ThreadLocal of SparnENV 4d0ea8b [Davies Liu] clear reference of SparkEnv after stop
SparkEnv is cached in ThreadLocal object, so after stop and create a new SparkContext, old SparkEnv is still used by some threads, it will trigger many problems, for example, pyspark will have problem after restart SparkContext, because py4j use thread pool for RPC.
This patch will clear all the references after stop a SparkEnv.
cc @mateiz @tdas @pwendell