-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-26564] Fix wrong assertions and error messages for parameter checking #23488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t and spark.executor.heartbeatInterval If users set equivalent values to spark.network.timeout and spark.executor.heartbeatInterval, they get the following message: ``` java.lang.IllegalArgumentException: requirement failed: The value of spark.network.timeout=120s must be no less than the value of spark.executor.heartbeatInterval=120s. ``` But it's misleading since it can be read as they could be equal. So this fix replaces "no less than" with "greater than".
It's okay but mind taking a look for similar instances? I think there are more. |
Thank you for the comment @HyukjinKwon. As you said, I found two additional mistakes by grep'ing the codebase.
But for the first one, I can't decide which of the constraint and the message is correct since I'm not so familar with ML. It doesn't seem to make sense to execute it with maxIter=0, but is there a use-case for that, such as a dry-run...? |
I think it's okay to leave them as are. Let's just get the current change in then. |
ok to test |
Test build #101010 has finished for PR 23488 at commit
|
In both of the additional cases above, the assertion is wrong and the message is right. You are welcome to fix both of them here, but I'm OK merging this too. |
Thank you for the advice @HyukjinKwon and @srowen. I'll fix them within this PR so let me update it. |
…hecking Fix newly found inconsistencies between assertions and messages
Test build #101029 has finished for PR 23488 at commit
|
@@ -413,7 +413,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap | |||
|
|||
private def init(): Unit = { | |||
if (mm != null) { | |||
require(capacity < 512000000, "Cannot broadcast more than 512 millions rows") | |||
require(capacity <= 512000000, "Cannot broadcast more than 512 millions rows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous condition was true.
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
Lines 80 to 83 in 79c6689
if (numRows >= 512000000) { | |
throw new SparkException( | |
s"Cannot broadcast the table with more than 512 millions rows: $numRows rows") | |
} |
negated numRows >= 512000000
should be capacity < 512000000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Sorry @sekikn could we change this one more time, to use <
and then fix the error message instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'm going to update the PR.
…hecking Additional fix to respect the existing error checking and revise the messages
Test build #101114 has finished for PR 23488 at commit
|
Test build #4505 has finished for PR 23488 at commit
|
Merged to master |
…hecking ## What changes were proposed in this pull request? If users set equivalent values to spark.network.timeout and spark.executor.heartbeatInterval, they get the following message: ``` java.lang.IllegalArgumentException: requirement failed: The value of spark.network.timeout=120s must be no less than the value of spark.executor.heartbeatInterval=120s. ``` But it's misleading since it can be read as they could be equal. So this PR replaces "no less than" with "greater than". Also, it fixes similar inconsistencies found in MLlib and SQL components. ## How was this patch tested? Ran Spark with equivalent values for them manually and confirmed that the revised message was displayed. Closes apache#23488 from sekikn/SPARK-26564. Authored-by: Kengo Seki <sekikn@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
If users set equivalent values to spark.network.timeout and spark.executor.heartbeatInterval, they get the following message:
But it's misleading since it can be read as they could be equal. So this PR replaces "no less than" with "greater than". Also, it fixes similar inconsistencies found in MLlib and SQL components.
How was this patch tested?
Ran Spark with equivalent values for them manually and confirmed that the revised message was displayed.