[SPARK-27812][K8S][2.4] Bump K8S client version to 4.6.1#26152
[SPARK-27812][K8S][2.4] Bump K8S client version to 4.6.1#26152igorcalabria wants to merge 3 commits intoapache:branch-2.4from
Conversation
|
Thank you, @igorcalabria . |
|
ok to test |
|
BTW, @igorcalabria .
I highly recommend you to do that. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #112247 has finished for PR 26152 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @igorcalabria and @srowen .
Merged to branch-2.4
# What changes were proposed in this pull request? Backport of #26093 to `branch-2.4` ### Why are the changes needed? https://issues.apache.org/jira/browse/SPARK-27812 https://issues.apache.org/jira/browse/SPARK-27927 We need this fix fabric8io/kubernetes-client#1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in #25785 ### Does this PR introduce any user-facing change? No ### How was this patch tested? This patch was tested manually using a simple pyspark job ```python from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession.builder.getOrCreate() ``` The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running ``` "OkHttp WebSocket https://10.96.0.1/..." #121 prio=5 os_prio=0 tid=0x00007fb27c005800 nid=0x24b waiting on condition [0x00007fb300847000] "OkHttp WebSocket https://10.96.0.1/..." #117 prio=5 os_prio=0 tid=0x00007fb28c004000 nid=0x247 waiting on condition [0x00007fb300e4b000] ``` This is caused by a bug on `kubernetes-client` library, which is fixed on the version that we are upgrading to. When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.0 is restored and both processes terminate successfully Closes #26152 from igorcalabria/k8s-client-update-2.4. Authored-by: igor.calabria <igor.calabria@ubee.in> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
|
Attempting to start Spark 2.4.0 in Azure using AKS with Kubernetes version 1.14.6, we ran into the error referenced here: kubernetes/kubernetes#82131 I attempted to build an image from tag Is it possible to apply this as a patch to 2.4.0, or will it absolutely require switching to the |
|
Hi, @sethhorrigan . What do you mean by |
|
If AKS breaks something in your production environment, please file an issue to them. That's the best way you get the commercial support you paid. |
|
Put another way, use 2.4.4 at least. 2.4.0-rc5 is some release candidate of an old release, not even a final release. |
|
@dongjoon-hyun in the commit referenced above (https://github.com/sethhorrigan/spark/commit/8f96a5ea3d078a205ceb5924bf7aa2af04e6ced1), you can see what I mean I am aware that @srowen the tag Has the change in this pull request been verified to fix kubernetes/kubernetes#82131 or is that just a hopeful guess? Edit: reading through the comments on https://issues.jenkins-ci.org/browse/JENKINS-59000 (referenced from kubernetes/kubernetes#82131) I see that using Hope this solution helps anyone else who stumbles on this thread as well. |
|
@sethhorrigan This PR also fixes the issue with kubernetes you mentioned(kubernetes-client was upgraded to a higher version). If you are building from source, I recommend that you start with the latest released minor(2.4.4) and apply this patch. Or, you could simply use |
What changes were proposed in this pull request?
Backport of #26093 to
branch-2.4Why are the changes needed?
https://issues.apache.org/jira/browse/SPARK-27812
https://issues.apache.org/jira/browse/SPARK-27927
We need this fix fabric8io/kubernetes-client#1768 that was released on version 4.6 of the client. The root cause of the problem is better explained in #25785
Does this PR introduce any user-facing change?
No
How was this patch tested?
This patch was tested manually using a simple pyspark job
The expected behaviour of this "job" is that both python's and jvm's process exit automatically after the main runs. This is the case for spark versions <= 2.4. On version 2.4.3, the jvm process hangs because there's a non daemon thread running
This is caused by a bug on
kubernetes-clientlibrary, which is fixed on the version that we are upgrading to.When the mentioned job is run with this patch applied, the behaviour from spark <= 2.4.0 is restored and both processes terminate successfully