Description
Is your feature request related to a problem or challenge?
In #13664 we introduced the Diagnostic
type and DataFusionError::Diagnostic
. They allow enriching errors with messages meant for consumption by end users of an application built on top of DataFusion, by providing rich information and context that directly references locations in the SQL query. They enable features like:
See datafusion/sql/tests/cases/diagnostic
for examples on how to extract and use diagnostics:
datafusion/datafusion/sql/tests/cases/diagnostic.rs
Lines 132 to 140 in d5428b2
In that PR, we only implemented diagnostics for:
- Unresolved table references
- Unresolved column references (qualified and non)
- Non-aggregate expressions missing from
GROUP BY
clause - Ambiguous column references
- Wrong number of columns in set expression (e.g.
UNION
) - Incompatible types in binary expressions
This issue is about using Diagnostic
in more places, and adding related tests to datafusion/sql/tests/cases/diagnostic
. We think we should at least implement the following, but suggestions are welcome and encouraged:
- Attach
Diagnostic
to "function x does not exist" error #14430 - Attach
Diagnostic
to "invalid function argument types" error #14431 - Attach
Diagnostic
to "wrong number of arguments" error #14432 - Attach
Diagnostic
to "incompatible type in unary expression" error #14433 - Emit warning with attached
Diagnostic
when doing= NULL
#14434 - Attach
Diagnostic
to "duplicate table name" error #14436 - Attach
Diagnostic
to syntax errors #14437 - Attach
Diagnostic
to "more than one column in subquery" error #14438 - Allow UDFs to return custom
Diagnostic
#15276
Describe the solution you'd like
The implementation should follow the steps of #13664, by calling DataFusionError.with_diagnostic
to attach a Diagnostic
to an error that is currently being returned. Tests should be added to datafusion/sql/tests/cases/diagnostic
for each newly supported scenario.
For some of these items, it might be necessary to enrich the logical types with the Span
information coming from the parser. This should be done using the datafusion::common::Spans
type (note the "s"), introduced in #13664 to add span information to datafusion::common::Column
.
It is desirable that the implementation is as little invasive as possible, in that it shouldn't require changing tons of function calls and types, unless absolutely necessary. The public facing API shouldn't change. The Diagnostic
should be attached as soon as possible to the creation of the wrapped DataFusionError
(i.e. deep in the call stack) and every error should ideally have just one Diagnostic
.
Describe alternatives you've considered
No response
Additional context
No response