-
Notifications
You must be signed in to change notification settings - Fork 936
Description
Hi all, I noticed a performance issue when extracting a PyBytes or a PyByteArray object into a Vec<u8>.
This is an issue one can easily run into without realizing it. Here's a scenario, let's say we'd like to expose a simple checksum function:
#[pyfunction]
fn checksum(data: &[u8]) -> PyResult<u8> {
let mut result = 0;
for x in data {
result ^= x;
}
Ok(result)
}See how it performs against the equivalent python implementation, processing 1MB a hundred times:
2.65s call test_checksum.py::test_perf[py-bytes]
0.00s call test_checksum.py::test_perf[rs-bytes]
Looks really fast! However, it won't accept a bytearray as an argument:
TypeError: argument 'data': 'bytearray' object cannot be converted to 'PyBytes'So we update our implementation to take a Vec<u8> instead:
#[pyfunction]
fn checksum(data: Vec<u8>) -> PyResult<u8> {
let mut result = 0;
for x in data {
result ^= x;
}
Ok(result)
}And now the results:
2.61s call test_checksum.py::test_perf[py-bytearray]
2.55s call test_checksum.py::test_perf[py-bytes]
1.92s call test_checksum.py::test_perf[rs-bytearray]
1.87s call test_checksum.py::test_perf[rs-bytes]
It performs roughly the same as python, which makes sense if we look at the FromPyObject implementation for Vec<T>:
Lines 314 to 318 in bed4f9d
| let mut v = Vec::with_capacity(seq.len().unwrap_or(0)); | |
| for item in seq.iter()? { | |
| v.push(item?.extract::<T>()?); | |
| } | |
| Ok(v) |
The bytes/bytearray object is iterated and each item (i.e a python integer) is separately extracted into a u8.
This could be fixed by specializing the extract logic in the case of a Vec<u8> and use specific methods such as PyBytes::as_bytes().to_vec() and PyByteArray::to_vec(). Here's a possible patch:
https://gist.github.com/vxgmichel/367e01e8504cb9c9e700a22525e8b68d
With this patch applied, the performance is now similar to what we had with the &[u8] slice:
2.70s call test_checksum.py::test_perf[py-bytearray]
2.65s call test_checksum.py::test_perf[py-bytes]
0.00s call test_checksum.py::test_perf[rs-bytes]
0.00s call test_checksum.py::test_perf[rs-bytearray]