Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for constant expression evaluation in limit #9790

Merged
merged 5 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 60 additions & 23 deletions datafusion/sql/src/query.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ use datafusion_common::{
};
use datafusion_expr::{
CreateMemoryTable, DdlStatement, Distinct, Expr, LogicalPlan, LogicalPlanBuilder,
Operator,
};
use sqlparser::ast::{
Expr as SQLExpr, Offset as SQLOffset, OrderByExpr, Query, SetExpr, SetOperator,
Expand Down Expand Up @@ -221,37 +222,29 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}

let skip = match skip {
Some(skip_expr) => match self.sql_to_expr(
skip_expr.value,
input.schema(),
&mut PlannerContext::new(),
)? {
Expr::Literal(ScalarValue::Int64(Some(s))) => {
if s < 0 {
return plan_err!("Offset must be >= 0, '{s}' was provided.");
}
Ok(s as usize)
}
_ => plan_err!("Unexpected expression in OFFSET clause"),
}?,
_ => 0,
};
Some(skip_expr) => {
let expr = self.sql_to_expr(
skip_expr.value,
input.schema(),
&mut PlannerContext::new(),
)?;
let n = get_constant_result(&expr, "OFFSET")?;
convert_usize_with_check(n, "OFFSET")
}
_ => Ok(0),
}?;

let fetch = match fetch {
Some(limit_expr)
if limit_expr != sqlparser::ast::Expr::Value(Value::Null) =>
{
let n = match self.sql_to_expr(
let expr = self.sql_to_expr(
limit_expr,
input.schema(),
&mut PlannerContext::new(),
)? {
Expr::Literal(ScalarValue::Int64(Some(n))) if n >= 0 => {
Ok(n as usize)
}
_ => plan_err!("LIMIT must not be negative"),
}?;
Some(n)
)?;
let n = get_constant_result(&expr, "LIMIT")?;
Some(convert_usize_with_check(n, "LIMIT")?)
}
_ => None,
};
Expand Down Expand Up @@ -283,3 +276,47 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}
}
}

/// Retrieves the constant result of an expression, evaluating it if possible.
///
/// This function takes an expression and an argument name as input and returns
/// a `Result<i64>` indicating either the constant result of the expression or an
/// error if the expression cannot be evaluated.
///
/// # Arguments
///
/// * `expr` - An `Expr` representing the expression to evaluate.
/// * `arg_name` - The name of the argument for error messages.
///
/// # Returns
///
/// * `Result<i64>` - An `Ok` variant containing the constant result if evaluation is successful,
/// or an `Err` variant containing an error message if evaluation fails.
///
/// <https://github.com/apache/arrow-datafusion/issues/9821> tracks a more general solution
fn get_constant_result(expr: &Expr, arg_name: &str) -> Result<i64> {
mustafasrepo marked this conversation as resolved.
Show resolved Hide resolved
match expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use ConstEvaluator and see if it is evaluated to a constant value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use this helper. I have imported datafusion-optimizer to datafusion-sql crate. This dependency caused an error in test_deps test. Hence I am retracting these changes for now.

Copy link
Member

@jonahgao jonahgao Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the Limit logical plan to support arbitrary expressions?

pub struct Limit {
    pub skip: Expr,
    pub fetch: Option<Expr>,
    pub input: Arc<LogicalPlan>,
}

The SimplifyExpressions rule can automatically optimize them into constants. Some optimization rules such as PushDownLimit only run when the limit expression is a constant. We may need to add a cast for the limit expression when planning, only checking if it is a constant of type u64.

When creating the LimitExec physical plan, convert the limit expression into PhysicalExpr and evaluate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the Limit logical plan to support arbitrary expressions?

I agree this would be the "correct" way to support arbitrarily simplifiable expressions in the limit clause

However, I suspect it might be a major change

So I guess it comes down to "how important is the + / - case -- if it is really important then we can proceed with this PR, but if the real use case is arbitrary expressions, it probably makes sense to do the more substantial change as suggested by @jonahgao

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have experimented with the keeping Expr in the limit state. However, it is not a trivial change. Also, it is not clear when to do error checking, when to ignore errors while converting skip and fetch to their corresponding u64 versions. I suggest, we first merge this PR. Then, we can have a support for arbitrary expressions in another PR (Also I think, we can postpone this feature until a use case hits.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #9821 to track

Expr::Literal(ScalarValue::Int64(Some(s))) => Ok(*s),
Expr::BinaryExpr(binary_expr) => {
let lhs = get_constant_result(&binary_expr.left, arg_name)?;
let rhs = get_constant_result(&binary_expr.right, arg_name)?;
let res = match binary_expr.op {
Operator::Plus => lhs + rhs,
Operator::Minus => lhs - rhs,
Operator::Multiply => lhs * rhs,
_ => return plan_err!("Unsupported operator for {arg_name} clause"),
};
Ok(res)
}
_ => plan_err!("Unexpected expression in {arg_name} clause"),
}
}

/// Converts an `i64` to `usize`, performing a boundary check.
fn convert_usize_with_check(n: i64, arg_name: &str) -> Result<usize> {
if n < 0 {
plan_err!("{arg_name} must be >= 0, '{n}' was provided.")
} else {
Ok(n as usize)
}
}
24 changes: 23 additions & 1 deletion datafusion/sqllogictest/test_files/select.slt
Original file line number Diff line number Diff line change
Expand Up @@ -550,9 +550,31 @@ select * from (select 1 a union all select 2) b order by a limit 1;
1

# select limit clause invalid
statement error DataFusion error: Error during planning: LIMIT must not be negative
statement error DataFusion error: Error during planning: LIMIT must be >= 0, '\-1' was provided\.
select * from (select 1 a union all select 2) b order by a limit -1;

# select limit with basic arithmetic
query I
select * from (select 1 a union all select 2) b order by a limit 1+1;
----
1
2

mustafasrepo marked this conversation as resolved.
Show resolved Hide resolved
# select limit with basic arithmetic
query I
select * from (values (1)) LIMIT 10*100;
----
1

# More complex expressions in the limit is not supported yet.
# See issue: https://github.com/apache/arrow-datafusion/issues/9821
statement error DataFusion error: Error during planning: Unsupported operator for LIMIT clause
select * from (values (1)) LIMIT 100/10;

# More complex expressions in the limit is not supported yet.
statement error DataFusion error: Error during planning: Unexpected expression in LIMIT clause
select * from (values (1)) LIMIT cast(column1 as tinyint);

# select limit clause
query I
select * from (select 1 a union all select 2) b order by a limit null;
Expand Down
Loading