Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35002][INFRA] Fix the java.net.BindException when testing with Github Action #32096

Closed
wants to merge 3 commits into from
Closed

[SPARK-35002][INFRA] Fix the java.net.BindException when testing with Github Action #32096

wants to merge 3 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Apr 8, 2021

What changes were proposed in this pull request?

This PR tries to fix the java.net.BindException when testing with Github Action:

[info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite *** ABORTED *** (282 milliseconds)
[info]   java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:461)
[info]   at sun.nio.ch.Net.bind(Net.java:453)

https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529

Why are the changes needed?

Fix test framework.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Test by Github Action.

…ervice 'sparkDriver' failed after 100 retries
@wangyum wangyum changed the title [WIP] Try to fix java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries [WIP] Try to fix the java.net.BindException when testing with Github Action Apr 8, 2021
@github-actions github-actions bot added the INFRA label Apr 8, 2021
@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41659/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41659/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Test build #137081 has finished for PR 32096 at commit 62083f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -47,6 +47,7 @@ jobs:
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
SPARK_BENCHMARK_CUR_SPLIT: ${{ matrix.split }}
SPARK_GENERATE_BENCHMARK_FILES: 1
SPARK_LOCAL_IP: localhost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works, 127.0.0.1 would be better?

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41681/

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41681/

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Test build #137103 has finished for PR 32096 at commit 617a1ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41686/

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 9, 2021

Hi, @wangyum . This seems to be a duplicated effort.

cc @HyukjinKwon

@@ -47,6 +47,7 @@ jobs:
SPARK_BENCHMARK_NUM_SPLITS: ${{ github.event.inputs.num-splits }}
SPARK_BENCHMARK_CUR_SPLIT: ${{ matrix.split }}
SPARK_GENERATE_BENCHMARK_FILES: 1
SPARK_LOCAL_IP: 127.0.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangyum, I think you should set HIVE_SERVER2_THRIFT_BIND_HOST too. BTW, I think this is virtually because InetAddress.getLocalHost, via DNS, returns a wrong address in specific GitHub Actions machines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the example

fv-az213-557.internal.cloudapp.net, 10.1.0.4, 10.1.0.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems locaohost much better. We do not need set HIVE_SERVER2_THRIFT_BIND_HOST.

@HyukjinKwon
Copy link
Member

Thanks for letting me know @dongjoon-hyun.

Please file a JIRA and go ahead with this approach @wangyum. I will try to take a deeper look and see if there's another way to fix.

@SparkQA
Copy link

SparkQA commented Apr 9, 2021

Test build #137108 has finished for PR 32096 at commit 7999243.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum changed the title [WIP] Try to fix the java.net.BindException when testing with Github Action [SPARK-35002][INFRA] Fix the java.net.BindException when testing with Github Action Apr 9, 2021
@dongjoon-hyun
Copy link
Member

The Hive test suite failures came from another commit (landed 5 hours ago) and I reverted it. Sorry for that guys.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master.

@HyukjinKwon
Copy link
Member

@wangyum, we need to set HIVE_SERVER2_THRIFT_BIND_HOST because the problem is from InetAddress.getLocalHost

@HyukjinKwon
Copy link
Member

hiveHost = System.getenv("HIVE_SERVER2_THRIFT_BIND_HOST");
if (hiveHost == null) {
hiveHost = hiveConf.getVar(ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST);
}
try {
if (hiveHost != null && !hiveHost.isEmpty()) {
serverIPAddress = InetAddress.getByName(hiveHost);
} else {
serverIPAddress = InetAddress.getLocalHost();
}

@HyukjinKwon
Copy link
Member

Ah okay but we explicitly set it at

| --hiveconf ${ConfVars.HIVE_SERVER2_THRIFT_BIND_HOST}=localhost

Okay, I agree with doing localhost.

@HyukjinKwon
Copy link
Member

Thanks for merging it in any event. The issue made the developement super slowed down.

wangyum pushed a commit that referenced this pull request Apr 9, 2021
… SPARK_LOCAL_IP in GA builds

### What changes were proposed in this pull request?

This PR replaces 127.0.0.1 to `localhost`.

### Why are the changes needed?

- #32096 (comment)
- #32096 (comment)

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR.

Closes #32102 from HyukjinKwon/SPARK-35002.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
pan3793 pushed a commit to apache/kyuubi that referenced this pull request Apr 9, 2021
… testing with Github Action

![turboFei](https://badgen.net/badge/Hello/turboFei/green) [![Closes #503](https://badgen.net/badge/Preview/Closes%20%23503/blue)](https://github.com/yaooqinn/kyuubi/pull/503) ![6](https://badgen.net/badge/%2B/6/red) ![1](https://badgen.net/badge/-/1/green) ![10](https://badgen.net/badge/commits/10/yellow) ![Test Plan](https://badgen.net/badge/Missing/Test%20Plan/ff0000) [<img width="16" alt="Powered by Pull Request Badge" src="https://user-images.githubusercontent.com/1393946/111216524-d2bb8e00-85d4-11eb-821b-ed4c00989c02.png">](https://pullrequestbadge.com/?utm_medium=github&utm_source=yaooqinn&utm_campaign=badge_info)<!-- PR-BADGE: PLEASE DO NOT REMOVE THIS COMMENT -->

<!--
Thanks for sending a pull request!

Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html
  2. If the PR is related to an issue in https://github.com/yaooqinn/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
-->

### _Why are the changes needed?_
Refer apache/spark#32096, apache/spark#32102,  this PR tries to fix the java.net.BindException when testing with Github Action.

```
SparkOperationSuite:
*** RUN ABORTED ***
  java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:461)
  at sun.nio.ch.Net.bind(Net.java:453)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
  at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:134)
  at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:550)
  at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
  at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
  at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
  at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
```

Also transfer FRONTEND_BIND_HOST by connection string to fix similar issue.

```
Cause: java.lang.RuntimeException: org.apache.kyuubi.KyuubiSQLException:org.apache.kyuubi.KyuubiException: Failed to initialize frontend service on fv-az207-19/10.1.1.0:0.
	at org.apache.kyuubi.service.FrontendService.initialize(FrontendService.scala:102)
	at org.apache.kyuubi.service.CompositeService.$anonfun$initialize$1(CompositeService.scala:40)
	at org.apache.kyuubi.service.CompositeService.$anonfun$initialize$1$adapted(CompositeService.scala:40)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.kyuubi.service.CompositeService.initialize(CompositeService.scala:40)
	at org.apache.kyuubi.service.Serverable.initialize(Serverable.scala:44)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine.initialize(SparkSQLEngine.scala:49)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.startEngine(SparkSQLEngine.scala:105)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:118)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/latest/tools/testing.html#running-tests) locally before make a pull request

Closes #503 from turboFei/KYUUBI-502.

Closes #502

1b10253 [fwang12] use localhost instead of 127.0.0.1
c104ce3 [fwang12] address comments
1e549c1 [fwang12] revert install shade
457ce2f [fwang12] try set frontend bind host in connection string
8bcd5a0 [fwang12] revert env KYUUBI_FRONTEND_BIND_HOST and set kyuubi.frontend.bind.host to 127.0.0.1 in scalatest-maven-plugin
717a992 [fwang12] update doc
d5ba05a [fwang12] add install shaded jars in release.yml
e8b2372 [fwang12] involve KYUUBI_FRONTEND_BIND_HOST
5eb7cdb [fwang12] also set KYUUBI_FRONTEND_BIND_HOST env to 127.0.0.1
7d70819 [fwang12] [KYUUBI #502][SPARK-35002][INFRA] Fix the java.net.BindException when testing with Github Action

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Cheng Pan <379377944@qq.com>
@wangyum wangyum deleted the SPARK_LOCAL_IP=localhost branch April 10, 2021 03:47
HyukjinKwon pushed a commit that referenced this pull request Apr 14, 2021
… Github Action

This PR tries to fix the `java.net.BindException` when testing with Github Action:
```
[info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite *** ABORTED *** (282 milliseconds)
[info]   java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:461)
[info]   at sun.nio.ch.Net.bind(Net.java:453)
```

https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529

Fix test framework.

No.

Test by Github Action.

Closes #32096 from wangyum/SPARK_LOCAL_IP=localhost.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 9663c40)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

I merged to branch-3.1 and branch-3.0 too. Seems like it has the same issue.

HyukjinKwon pushed a commit that referenced this pull request Apr 14, 2021
… Github Action

This PR tries to fix the `java.net.BindException` when testing with Github Action:
```
[info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite *** ABORTED *** (282 milliseconds)
[info]   java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:461)
[info]   at sun.nio.ch.Net.bind(Net.java:453)
```

https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529

Fix test framework.

No.

Test by Github Action.

Closes #32096 from wangyum/SPARK_LOCAL_IP=localhost.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 9663c40)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Apr 14, 2021
… SPARK_LOCAL_IP in GA builds

This PR replaces 127.0.0.1 to `localhost`.

- #32096 (comment)
- #32096 (comment)

No, dev-only.

I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR.

Closes #32102 from HyukjinKwon/SPARK-35002.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit a3d1e00)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Apr 14, 2021
… SPARK_LOCAL_IP in GA builds

This PR replaces 127.0.0.1 to `localhost`.

- #32096 (comment)
- #32096 (comment)

No, dev-only.

I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR.

Closes #32102 from HyukjinKwon/SPARK-35002.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit a3d1e00)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
… Github Action

This PR tries to fix the `java.net.BindException` when testing with Github Action:
```
[info] org.apache.spark.sql.kafka010.producer.InternalKafkaProducerPoolSuite *** ABORTED *** (282 milliseconds)
[info]   java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:461)
[info]   at sun.nio.ch.Net.bind(Net.java:453)
```

https://github.com/apache/spark/pull/32090/checks?check_run_id=2295418529

Fix test framework.

No.

Test by Github Action.

Closes apache#32096 from wangyum/SPARK_LOCAL_IP=localhost.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 9663c40)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
… SPARK_LOCAL_IP in GA builds

This PR replaces 127.0.0.1 to `localhost`.

- apache#32096 (comment)
- apache#32096 (comment)

No, dev-only.

I didn't test it because it's CI specific issue. I will test it in Github Actions build in this PR.

Closes apache#32102 from HyukjinKwon/SPARK-35002.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit a3d1e00)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants