Skip to content

[SPARK-5825] [Spark Submit] Remove the double checking instance name when stopping the service #4611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

chenghao-intel
Copy link
Contributor

spark-daemon.sh will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
This PR looses the check for the service process.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27510 has started for PR 4611 at commit f1bef66.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

#4382 failed due to the SparkSubmit processes (HiveThriftServer) cannot be killed in unit testing. Probably we need to manually kill those processes, or restart the jenkins servers.

@chenghao-intel chenghao-intel changed the title [SPARK-5825] [Spark Submit] Remove the double checking when killing process [SPARK-5825] [Spark Submit] Remove the double checking instance name when stopping the service Feb 15, 2015
@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27510 has finished for PR 4611 at commit f1bef66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27510/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 15, 2015

I don't think we want to take out this check entirely. This was changed recently to not just test whether the process can be killed (with kill -0) as a way of testing whether it's even a Spark process, but to further examine the command line. That stinks if this fails on long command lines.

What about a weaker check like just grepping for spark in the command line? the point here is to not kill a foreign process that happens to have taken the PID named in a stale Spark PID file.

@chenghao-intel
Copy link
Contributor Author

Thank you @srowen for the explanation, I've updated the script, can you review it again?

@SparkQA
Copy link

SparkQA commented Feb 16, 2015

Test build #27537 has started for PR 4611 at commit 0b9ea52.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 16, 2015

Test build #27537 has finished for PR 4611 at commit 0b9ea52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27537/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 16, 2015

This uses jps, but that is only included with the JDK. I think it's reasonable to expect many production deployments will only use the JRE. How about simply ps -p 4391 | grep -q $command ? does that have the same problem? I suppose I wasn't clear what part of the current solution doesn't work with long command lines.

@chenghao-intel
Copy link
Contributor Author

@srowen, I was trying something like ps -p 4391 | grep -q $command, but it doesn't work. The process name is always java, and the parameters are -cp xxxxx, and I believe the command is always cut in the long command line.

jps seems the only way can print out the right process name, which is SparkSubmit.

@srowen
Copy link
Member

srowen commented Feb 17, 2015

@chenghao-intel ah, looks like the default behavior on OS X is different from Linux. Try ps -f -p ... on either one and it looks like you get something containing the full command. Does that work?

@chenghao-intel
Copy link
Contributor Author

@srowen I've tested that both under ubuntu 12.04 and centos 6.5, ps -f -p ... only print the first 4096 characters of its arguments.
By the way, I've also checked the hadoop-daemon.sh of hadoop (hadoop 2.3), seems it doesn't confirm the process name as we did in spark-daemon.sh. Or can we just confirm if it's a java process?

@srowen
Copy link
Member

srowen commented Feb 17, 2015

Gotcha. OK. Until someone thinks of a more robust check, I think we can resort to just checking if ps -p $pid -o comm= is java? EDIT: better grep for java too as I find that on OS X, comm= prints the whole absolute path to the binary.

@SparkQA
Copy link

SparkQA commented Feb 18, 2015

Test build #27670 has started for PR 4611 at commit a0051f6.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

Thank you @srowen , I've updated the script, seems much simpler. Appreciated if you can confirm the change under OS X, sorry, I don't have machine with OS X.

@SparkQA
Copy link

SparkQA commented Feb 18, 2015

Test build #27670 has finished for PR 4611 at commit a0051f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27670/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 18, 2015

As I say on OS X you get the whole binary path, not just java:

ps -p ... -o comm=
...
/Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java

that's why I was thinking if ps -p "$TARGET_ID" -o comm= | grep -q java ; then

@nchammas
Copy link
Contributor

Bash style for this change is fine, though there are other areas in this script with word splitting problems, unfortunately (e.g. some references to $option). Shouldn't block this PR though.

@andrewor14
Copy link
Contributor

@srowen it actually does a =~ which should be equivalent to a grep:

if [[ "abc" =~ "abc" ]]; then echo woohoo; fi # woohoo
if [[ "ffabc" =~ "abc" ]]; then echo woohoo; fi # woohoo
if [[ "ffabcff" =~ "abc" ]]; then echo woohoo; fi # woohoo

This seems like what we want, so I'm going to merge this into master 1.3 and 1.2 thanks @chenghao-intel

asfgit pushed a commit that referenced this pull request Feb 19, 2015
…when stopping the service

`spark-daemon.sh` will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
This PR looses the check for the service process.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #4611 from chenghao-intel/stopping_service and squashes the following commits:

a0051f6 [Cheng Hao] loosen the process checking while stopping a service

(cherry picked from commit 94cdb05)
Signed-off-by: Andrew Or <andrew@databricks.com>
asfgit pushed a commit that referenced this pull request Feb 19, 2015
…when stopping the service

`spark-daemon.sh` will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
This PR looses the check for the service process.

Author: Cheng Hao <hao.cheng@intel.com>

Closes #4611 from chenghao-intel/stopping_service and squashes the following commits:

a0051f6 [Cheng Hao] loosen the process checking while stopping a service

(cherry picked from commit 94cdb05)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 94cdb05 Feb 19, 2015
@srowen
Copy link
Member

srowen commented Feb 19, 2015

@andrewor14 Ah! right of course. I looked right past that. Yes that's a good change then.

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
…when stopping the service

`spark-daemon.sh` will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
This PR looses the check for the service process.

Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#4611 from chenghao-intel/stopping_service and squashes the following commits:

a0051f6 [Cheng Hao] loosen the process checking while stopping a service

(cherry picked from commit 94cdb05)
Signed-off-by: Andrew Or <andrew@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants