Kernels don't account for slicing of `RunArray`s

**Describe the bug**


Certain kernels when operating on `RunArray`s don't account for them potentially being sliced, meaning they can return incorrect results.

**To Reproduce**


See following tests:

https://github.com/apache/arrow-rs/blob/addf74d72d88684dabac21822c22f35e803135c5/arrow-cast/src/cast/mod.rs#L11748-L11763

https://github.com/apache/arrow-rs/blob/addf74d72d88684dabac21822c22f35e803135c5/arrow-select/src/concat.rs#L1718-L1741

https://github.com/apache/arrow-rs/blob/addf74d72d88684dabac21822c22f35e803135c5/arrow-select/src/filter.rs#L1357-L1375

https://github.com/apache/arrow-rs/blob/addf74d72d88684dabac21822c22f35e803135c5/arrow-select/src/interleave.rs#L1174-L1198

These are not necessarily exhaustive; they are just the ones I found. The only other kernel I checked was take, which seemed fine:

https://github.com/apache/arrow-rs/blob/addf74d72d88684dabac21822c22f35e803135c5/arrow-select/src/take.rs#L2441-L2460

**Expected behavior**


Kernels should account for slicing when operating on `RunArray`s.

**Additional context**


In fixing these kernels we should see if we can enhance the RunArray/RunEndBuffer APIs to make it easier to handle sliced arrays; for example in https://github.com/apache/datafusion/pull/18981 I found it a bit confusing to try understand what's going on, since there was logic to manually slice the values ourselves.

	#[test]
	#[should_panic = "assertion `left == right` failed\n left: ScalarBuffer([1, 1, 2])\n right: [2, 2, 3]"]
	// TODO: fix cast of RunArrays to account for sliced RunArray's
	// https://github.com/apache/arrow-rs/issues/9018
	fn test_sliced_run_end_encoded_to_primitive() {
	let run_ends = Int32Array::from(vec![2, 5, 6]);
	let values = Int32Array::from(vec![1, 2, 3]);
	// [1, 1, 2, 2, 2, 3]
	let run_array = RunArray::<Int32Type>::try_new(&run_ends, &values).unwrap();
	let run_array = run_array.slice(3, 3); // [2, 2, 3]
	let array_ref = Arc::new(run_array) as ArrayRef;

	let cast_result = cast(&array_ref, &DataType::Int64).unwrap();
	let result_run_array = cast_result.as_primitive::<Int64Type>();
	assert_eq!(result_run_array.values(), &[2, 2, 3]);
	}

	#[test]
	#[should_panic = "assertion `left == right` failed\n left: [20, 20, 40, 40, 40]\n right: [10, 10, 20, 20, 30, 40, 40, 40]"]
	// TODO: fix concat of RunArrays to account for sliced RunArray's
	// https://github.com/apache/arrow-rs/issues/9018
	fn test_concat_sliced_run_array() {
	// Slicing away first run in both arrays
	let run_ends1 = Int32Array::from(vec![2, 4]);
	let values1 = Int32Array::from(vec![10, 20]);
	let array1 = RunArray::try_new(&run_ends1, &values1).unwrap(); // [10, 10, 20, 20]
	let array1 = array1.slice(2, 2); // [20, 20]

	let run_ends2 = Int32Array::from(vec![1, 4]);
	let values2 = Int32Array::from(vec![30, 40]);
	let array2 = RunArray::try_new(&run_ends2, &values2).unwrap(); // [30, 40, 40, 40]
	let array2 = array2.slice(1, 3); // [40, 40, 40]

	let result = concat(&[&array1, &array2]).unwrap();
	let result = result.as_run::<Int32Type>();
	let result = result.downcast::<Int32Array>().unwrap();

	let expected = vec![20, 20, 40, 40, 40];
	let actual = result.into_iter().flatten().collect::<Vec<_>>();
	assert_eq!(expected, actual);
	}

	#[test]
	#[should_panic = "assertion `left == right` failed\n left: [-2, 9]\n right: [7, -2]"]
	// TODO: fix filter of RunArrays to account for sliced RunArray's
	// https://github.com/apache/arrow-rs/issues/9018
	fn test_filter_run_end_encoding_array_sliced() {
	let run_ends = Int64Array::from(vec![2, 3, 8]);
	let values = Int64Array::from(vec![7, -2, 9]);
	let a = RunArray::try_new(&run_ends, &values).unwrap(); // [7, 7, -2, 9, 9, 9, 9, 9]
	let a = a.slice(2, 3); // [-2, 9, 9]
	let b = BooleanArray::from(vec![true, false, true]);
	let result = filter(&a, &b).unwrap();

	let result = result.as_run::<Int64Type>();
	let result = result.downcast::<Int64Array>().unwrap();

	let expected = vec![-2, 9];
	let actual = result.into_iter().flatten().collect::<Vec<_>>();
	assert_eq!(expected, actual);
	}

	#[test]
	#[should_panic = "assertion `left == right` failed\n left: [1, 4, 2, 5, 6]\n right: [2, 5, 2, 5, 6]"]
	// TODO: fix interleave of RunArrays to account for sliced RunArray's
	// https://github.com/apache/arrow-rs/issues/9018
	fn test_interleave_run_end_encoded_sliced() {
	let mut builder = PrimitiveRunBuilder::<Int32Type, Int32Type>::new();
	builder.extend([1, 1, 2, 2, 2, 3].into_iter().map(Some));
	let a = builder.finish();
	let a = a.slice(2, 3); // [2, 2, 2]

	let mut builder = PrimitiveRunBuilder::<Int32Type, Int32Type>::new();
	builder.extend([4, 5, 5, 6, 6, 6].into_iter().map(Some));
	let b = builder.finish();
	let b = b.slice(1, 3); // [5, 5, 6]

	let indices = &[(0, 1), (1, 0), (0, 3), (1, 2), (1, 3)];
	let result = interleave(&[&a, &b], indices).unwrap();

	let result = result.as_run::<Int32Type>();
	let result = result.downcast::<Int32Array>().unwrap();

	let expected = vec![2, 5, 2, 5, 6];
	let actual = result.into_iter().flatten().collect::<Vec<_>>();
	assert_eq!(actual, expected);
	}

	#[test]
	fn test_take_runs_sliced() {
	let logical_array: Vec<i32> = vec![1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6];

	let mut builder = PrimitiveRunBuilder::<Int32Type, Int32Type>::new();
	builder.extend(logical_array.into_iter().map(Some));
	let run_array = builder.finish();

	let run_array = run_array.slice(4, 6); // [3, 3, 3, 4, 4, 5]

	let take_indices: PrimitiveArray<Int32Type> = vec![0, 5, 5, 1, 4].into_iter().collect();

	let result = take_run(&run_array, &take_indices).unwrap();
	let result = result.downcast::<Int32Array>().unwrap();

	let expected = vec![3, 5, 5, 3, 4];
	let actual = result.into_iter().flatten().collect::<Vec<_>>();

	assert_eq!(expected, actual);
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernels don't account for slicing of `RunArray`s #9018

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernels don't account for slicing of RunArrays #9018

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Kernels don't account for slicing of `RunArray`s #9018