-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port range function and change gen_series logic #9352
Conversation
let mut values = vec![]; | ||
let mut offsets = vec![0]; | ||
for (idx, stop) in stop_array.iter().enumerate() { | ||
let stop = stop.unwrap_or(0) + include_upper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate_series(i64::MAX, i64::MAX)
will panic.
DataFusion CLI v36.0.0
❯ select generate_series(9223372036854775807, 9223372036854775807);
thread 'main' panicked at datafusion/functions-array/src/kernels.rs:296:20:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
It can succeed in PostgreSQL and DuckDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't have i128Array yet, so probably this panic is unavoidable until we support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the result is incorrect when the step is a negative number.
DataFusion CLI v36.0.0
❯ select generate_series(5,1,-1);
+----------------------------------------------+
| generate_series(Int64(5),Int64(1),Int64(-1)) |
+----------------------------------------------+
| [5, 4, 3] |
+----------------------------------------------+
1 row in set. Query took 0.005 seconds.
In DuckDB:
D select generate_series(5,1,-1);
┌───────────────────────────┐
│ generate_series(5, 1, -1) │
│ int64[] │
├───────────────────────────┤
│ [5, 4, 3, 2, 1] │
└───────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't have i128Array yet, so probably this panic is unavoidable until we support it.
I tried it in the following way, and then it worked, but I haven't checked it carefully yet.
for (idx, stop) in stop_array.iter().enumerate() {
let stop = stop.unwrap_or(0);
let start = start_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(0);
let step = step_array.as_ref().map(|arr| arr.value(idx)).unwrap_or(1);
if step == 0 {
return exec_err!("step can't be 0 for function range(start [, stop, step]");
}
if step < 0 {
// Decreasing range
values.extend((stop + 1..start + 1).rev().step_by((-step) as usize));
} else {
// Increasing range
values.extend((start..stop).step_by(step as usize));
}
// TODO: include_upper should be a boolean flag
if include_upper > 0 {
match values.last() {
Some(&last) if last + step == stop => {
values.push(stop);
}
None => {
values.push(stop);
}
_ => {}
}
}
offsets.push(values.len() as i32);
}
UPDATE:
Still panic on the following queries:
select generate_series(9223372036854775807, 9223372036854775807, -1)
select generate_series(-9223372036854775807, -9223372036854775808, -2)
Additional checks might be needed regarding negative step values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the edge cases for future handling is okay with me. However, the behavior for negative step is incorrect and needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
# under the License. | ||
|
||
query ? | ||
SELECT range(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already test in array.slt, If you want to move to a new file, don't forget them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving these tests into array.slt
? Their number is not large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge them in array.slt
/// gen_range(3) => [0, 1, 2] | ||
/// gen_range(1, 4) => [1, 2, 3] | ||
/// gen_range(1, 7, 2) => [1, 3, 5] | ||
pub fn gen_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont forget to delete the code in array_expression.rs
make_udf_function!( | ||
Range, | ||
range, | ||
input diamilter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be
input diamilter, | |
start stop step, |
And it will be expanded to
pub fn range(start: Expr, stop: Expr, step: Expr) -> Expr {
Expr::ScalarFunction(
ScalarFunction::new_udf(
range_udf(),
<[_]>::into_vec(
#[rustc_box]
::alloc::boxed::Box::new([start, stop, step]),
),
),
)
}
cargo expand
can help to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
make_udf_function!( | ||
GenSeries, | ||
gen_series, | ||
input diamilter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input diamilter, | |
start stop step, |
Similar to above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
We might also need to update the documentation.
sure, I would do it right now |
I've added a few suggestions about the document, the rest is okay for me 👍 |
3832b88
to
3908bb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @Lordworms and @jonahgao ! |
@@ -2906,7 +2906,28 @@ empty(array) | |||
|
|||
### `generate_series` | |||
|
|||
_Alias of [range](#range)._ | |||
Similar to the range function, but it includes the upper bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #9323
Closes #9351
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?