-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file #29497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok to test |
* org.apache.spark.sql.catalyst.analysis.Analyzer. | ||
*/ | ||
object CatalystErrors { | ||
def groupingIDMismatchError(groupingID: GroupingID, groupByExprs: Seq[Expression]): Throwable = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about moving these methods into object AnalysisException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a summary sheet of raw exceptions thrown in Catalyst of branch-3.0: OSS Catalyst Exception Messages: Branch-3.0. Apart from AnalysisException
and its sub-exceptions, there are a lot of other exceptions, for example, IllegalArgumentException
and UnsupportedOperationException
.
We generally have two ways to group these messages:
- By component. All messages from a single components, no matter the exception type, are grouped into one exception file.
- By exception type. All messages of a single exception type are created in the corresponding exception object. It is great because by calling
AnalysisException.groupingIDMismatchError
we know the exception type this function throws. But this approach has a problem: those Java/Scala internal exception type cannot follow this way.
Due to the small problem of the latter one, I choose the first one of grouping all exception messages from Catalyst in one file. Or do you have any thoughts on that? Thanks!
/** | ||
* Object for grouping all error messages in catalyst. | ||
* Currently it includes all AnalysisExcpetions created and thrown directly in | ||
* org.apache.spark.sql.catalyst.analysis.Analyzer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question; did you propose to group all the analysis exception here? I think we already have too many places (~500) where the analysis exceptions used though... https://gist.github.com/maropu/29bbfa9a93c41bd4ba1e0eaa038af087
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do all of them, I think. This is just an example PR.
This can help us do error message auditing. In the future, we can review this file, improve the error message quality, and make them unified and standard in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks nice, @gatorsmile. I think we might be able to add a new rule in scalastyle
for forbidding the use of AnalysisException
in the the other files.
Test build #127708 has finished for PR 29497 at commit
|
def legacyStoreAssignmentPolicyError(): Throwable = { | ||
val configKey = SQLConf.STORE_ASSIGNMENT_POLICY.key | ||
new AnalysisException( | ||
s"LEGACY store assignment policy is disallowed in Spark data source V2. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's remove leading s
where it doesn't need.
How do we plan this? |
The whole community needs to work on this together, define a guide for error messages and gradually improve the error messages in the future releases. Grouping the messages in a single file should not be blocked by the guide and message improvement. Basically, these PRs are like code cleaning and refactoring. This effort will help us audit the error messages before each release. |
I'm +1 to this idea, error message is super important to end-users, as it tells them what went wrong and how to fix it. It's easier to audit them if they are grouped together. We need a clear way to organize it. This PR proposes one idea is to use different package names for different modules:
I'm open to other ideas as well. cc @maropu @viirya @dongjoon-hyun |
This idea sounds okay. I just have few questions.
|
Ah, I see. Splitting exceptions into multiple fine-graind groups looks a nice idea.
I agree to the @viirya suggestion and I think we need a simple but clear rule to categorize exceptions into groups so that developers do not get confused. Probably, we can follow a simple mapping based on exception classes? e.g.,
|
In the ideal case, all exception messages should be grouped for easy maintenance and auditing. This PR first starts from the And yeah, if we want to divide these exceptions in groups, then the mapping rule, and how to divide so that contributors can easily follow the rule to commit is a problem. And, another concern of the Grouping exceptions in a single file will necessarily explode the file size; but I think the process is that first we group them, then we look for ways to optimize them, for example, find the duplicate error messages and combine them in one function, or, divide them into smaller and refined groups. Open to all ideas and great thanks for your review! cc @cloud-fan @maropu @viirya |
Test build #128490 has finished for PR 29497 at commit
|
This is a good point. Is it possible that we put all the error messages in the catalyst module? Other modules depend on catalyst and can access error messages. |
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move org.apache.spark.sql => org.apache.spark.sql.errors
* Currently it includes all AnalysisExcpetions created and thrown directly in | ||
* org.apache.spark.sql.catalyst.analysis.Analyzer. | ||
*/ | ||
object CatalystErrors { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename CatalystErrors => QueryCompilationErrors
Any update? I think its okay to target at grouping |
Test build #131131 has finished for PR 29497 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
import org.apache.spark.sql.types.{AbstractDataType, DataType, StructType} | ||
|
||
/** | ||
* Object for grouping all error messages in catalyst. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object for grouping all error messages of the query compilation.
) | ||
} | ||
|
||
def nonliteralPivotValError(pivotVal: Expression): Throwable = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonLiteral...
Test build #131140 has finished for PR 29497 at commit
|
Test build #131133 has finished for PR 29497 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #131144 has finished for PR 29497 at commit
|
LGTM |
Merged to master. |
|
||
/** | ||
* Object for grouping all error messages of the query compilation. | ||
* Currently it includes all AnalysisExcpetions created and thrown directly in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalysisExcpetions
-> AnalysisExceptions
?
|
||
def groupingSizeTooLargeError(sizeLimit: Int): Throwable = { | ||
new AnalysisException( | ||
s"Grouping sets size cannot be greater than $sizeLimit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is related to the original message, maybe, set's size
or sets' size
is better?
late LGTM. Thank you all. |
…lyzer in one file ### What changes were proposed in this pull request? This PR follows up #29497. Because #29497 just give us an example to group all `AnalysisExcpetion` in Analyzer into QueryCompilationErrors. This PR group other `AnalysisExcpetion` into QueryCompilationErrors. ### Why are the changes needed? It will largely help with standardization of error messages and its maintenance. ### Does this PR introduce _any_ user-facing change? No. Error messages remain unchanged. ### How was this patch tested? No new tests - pass all original tests to make sure it doesn't break any existing behavior. Closes #30564 from beliefer/SPARK-32670-followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Group all messages of
AnalysisExcpetions
created and thrown directly in org.apache.spark.sql.catalyst.analysis.Analyzer in one file.org.apache.spark.sql.CatalystErrors
with many exception-creating functions.Analyzer
wants to create and throw a newAnalysisException
, call functions ofCatalystErrors
Why are the changes needed?
This is the sample PR that groups exception messages together in several files. It will largely help with standardization of error messages and its maintenance.
Does this PR introduce any user-facing change?
No. Error messages remain unchanged.
How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.
Naming of exception functions
All function names ended with
Error
.groupingIDMismatch
andgroupingColInvalid
, directly use them as name, just likegroupingIDMismatchError
andgroupingColInvalidError
.dataTypeMismatch
,pivotValDataTypeMismatchError
For
suffix of the specific component that this exception is related to, likedataTypeMismatchForDeserializerError