Skip to content

Vectorized Paeth filtering (multiple pixels at the same time) #157

@IJzerbaard

Description

@IJzerbaard

Most PNG decoders (including the one in Wuffs from what I can see) use a pixel-by-pixel approach to filtering, and may use SIMD for parallelism across the channels of a pixel. However, it is possible to use SIMD to apply the Paeth filter to multiple pixels at once. Not 4 sequential pixels, which would be the neatest case for SIMD but is prevented by dependencies between the pixels, but 4 anti-diagonal pixels. 4 anti-diagonal pixels can be collected together in a vector relatively efficiently by loading 4 rows of pixels, staggered so that the first row has an x-offset of 3 pixels compared to the last row, then transpose those 4 rows (similar to _MM_TRANSPOSE4_PS, but with integer vectors). That produces 4 of those sets of 4 anti-diagonal pixels for the price of 8 shuffles, and after applying the filter another 8 shuffles are needed to un-transpose them (some additional shuffles are needed to update the "top" aka B vector between columns). An anti-diagonal group of pixels does not have dependencies between the pixels in the same group (as a horizontal group of pixels would) so that kind of group can be filtered at the same time using SIMD. All the shuffling is not free but in my tests it was well worth doing.

Some diagrams:

Image

Image

Filtering 4 or 8 rows at the same time (filtering 8 rows at once exposes more ILP) does not fit very naturally onto the "possibly different filter for each row" nature of the PNG format, but it is still possible in cases where the Paeth filter is used for 4 or 8 adjacent rows. The Avg filter can be implemented in a similar manner.

If the Wuffs project is interested in this approach, I'm willing to help integrate it into Wuffs, but I'm really not familiar with the language.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions