Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jul 7, 2018

What changes were proposed in this pull request?

In the PR, I propose to provide a tip to user how to resolve the issue of timeout expiration for broadcast joins. In particular, they can increase the timeout via spark.sql.broadcastTimeout or disable the broadcast at all by setting spark.sql.autoBroadcastJoinThreshold to -1.

How was this patch tested?

It tested manually from spark-shell:

scala> spark.conf.set("spark.sql.broadcastTimeout", 1)
scala> val df = spark.range(100).join(spark.range(15).as[Long].map { x =>
               Thread.sleep(5000)
               x
            }).where("id = value")
scala> df.count()
org.apache.spark.SparkException: Could not execute broadcast in 1 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1
  at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:150)

MaxGekk added 2 commits July 7, 2018 17:43
I added a recommendation for increasing broadcast timeout. This sentence is added to existing error message:

```
You can increase the timeout for broadcasts via ${SQLConf.BROADCAST_TIMEOUT.key}
```

Author: Maxim Gekk <maxim.gekk@databricks.com>

Closes apache#2801 from MaxGekk/broadcast-error-message.
@MaxGekk
Copy link
Member Author

MaxGekk commented Jul 7, 2018

@hvanhovell Please, have a look at the PR.

@SparkQA
Copy link

SparkQA commented Jul 7, 2018

Test build #92712 has finished for PR 21727 at commit 86587ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

@asfgit asfgit closed this in 79c6689 Jul 7, 2018
@MaxGekk MaxGekk deleted the broadcast-timeout-error branch August 17, 2019 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants