Skip to content

FlatMap, Flatten appear to optimize badly #87411

Closed
@adrian17

Description

Context: I recently saw a perf regression caused by the following change:

pub fn pixels_rgba(&self) -> Vec<u8> {
    let mut output = Vec::new();
    for p in &self.pixels { output.extend_from_slice(&[p.red(), p.green(), p.blue(), p.alpha()]) }
    output
}
->
pub fn pixels_rgba(&self) -> Vec<u8> {
    self.pixels.iter().flat_map(|p| [p.red(), p.green(), p.blue(), p.alpha()]).collect();
}

The change was reasonable, with assumption that this would be more idiomatic and guarantee that the output Vec is preallocated. Unfortunately, this made the function several times slower. Recently-merged #87168 helped here slightly, but the generated code is still much slower.

The same can also be observed for other code using flat_map or flatten, like a simple iteration over the iterator, and regardless of whether the flattened type has known size (array) or not.

I made an example benchmark in repo https://github.com/adrian17/flat_map_perf , with the following results on my machine (with today's nightly rustc):


tests::bench_array_4x500000_collect_loop               1,269,560 ns/iter (+/- 146,537)
tests::bench_array_4x500000_collect_loop_with_prealloc 1,255,140 ns/iter (+/- 165,287)
tests::bench_array_4x500000_collect_with_flat_map      2,697,082 ns/iter (+/- 303,411)

tests::bench_array_4x500000_iteration_nested_loop        220,838 ns/iter (+/- 25,307)
tests::bench_array_4x500000_iteration_flat_map         3,029,744 ns/iter (+/- 463,749)


tests::bench_iter_4000x500_collect_loop                  243,537 ns/iter (+/- 34,574)
tests::bench_iter_4000x500_collect_loop_with_prealloc    243,246 ns/iter (+/- 34,197)
tests::bench_iter_4000x500_collect_with_flatten        3,521,586 ns/iter (+/- 597,755)

tests::bench_iter_4000x500_iteration_nested_loop         290,939 ns/iter (+/- 34,414)
tests::bench_iter_4000x500_iteration_flatten           2,099,386 ns/iter (+/- 512,732)


tests::bench_iter_4x500000_collect_loop                3,124,601 ns/iter (+/- 444,296)
tests::bench_iter_4x500000_collect_loop_with_prealloc  2,873,051 ns/iter (+/- 576,719)
tests::bench_iter_4x500000_collect_with_flatten        5,579,601 ns/iter (+/- 796,355)

tests::bench_iter_4x500000_iteration_nested_loop       2,118,351 ns/iter (+/- 396,325)
tests::bench_iter_4x500000_iteration_flatten           3,187,518 ns/iter (+/- 443,080)

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions