-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
(This was found by a security audit performed by InfluxData)
The public method finish in datafusion_functions::strings::StringArrayBuilder takes an argument, null_buffer, which is unconditionally assigned to an internal member; however, the contract for this data type requires that the three buffers (value_buffer, offsets_buffer, and nulls) are of the same length.
The argument to finish can be Option with a buffer of arbitrary length, but the length of it is not checked; therefore, the contract might be violated and subsequent usage of the resulting StringArray might lead to out-of-bounds reads or writes.
datafusion/datafusion/functions/src/strings.rs
Lines 338 to 348 in e8314ab
| pub fn finish(self, null_buffer: Option<NullBuffer>) -> LargeStringArray { | |
| let array_builder = ArrayDataBuilder::new(DataType::LargeUtf8) | |
| .len(self.offsets_buffer.len() / size_of::<i64>() - 1) | |
| .add_buffer(self.offsets_buffer.into()) | |
| .add_buffer(self.value_buffer.into()) | |
| .nulls(null_buffer); | |
| // SAFETY: all data that was appended was valid Large UTF8 and the values | |
| // and offsets were created correctly | |
| let array_data = unsafe { array_builder.build_unchecked() }; | |
| LargeStringArray::from(array_data) | |
| } |
I (@alamb ) analyzed the code and I do think there is a problem, but I do not think it is exploitable from SQL or other downstream applications. It would only affect someone using the StringArrayBuilder or LargeStringArrayBuilder APIs directly (though they are pub, see doc links)
The reason I don't think it is exploitable is that it is only called in in two locations:
- (called with
datafusion/datafusion/functions/src/string/concat.rs
Lines 217 to 218 in 3ee9b3d
let string_array = builder.finish(None); Noneaka no buffer) - called with
Ok(ColumnarValue::Array(Arc::new(builder.finish(sep.nulls())))) Some(..)buffer from the input array (that is the same size as the output array)
And the size of the null buffer is correct in both cases
To Reproduce
No response
Expected behavior
No response
Additional context
No response