Skip to content

RISC-V EVL tail folding #123069

Open
1 of 3 issues completed
Open
1 of 3 issues completed
@lukel97

Description

On the spacemit-x60, GCC 14 is ~24% faster on the 525.x264_r SPEC CPU 2017 benchmark than a recent build of Clang.

A big chunk of this difference is due to GCC tail folding its loops with VL, whereas LLVM doesn't by default.

Because LLVM doesn't tail fold its loops, it generates both a vectorized body and a scalar epilogue. There is a minimum trip count >= VF required to execute the vectorized body, otherwise it can only run the scalar epilogue.

On 525.x264_r, there are some very hot functions (e.g. get_ref) which never meet the minimum trip count and so the vector code is never ran. Tail folding avoids this issue and allows us to run the vectorized body every time.

There are likely other performance benefits to be had with tail folding with VL, so it seems worthwhile exploring.

"EVL tail folding" (LLVM's vector-predication terminology for VL tail folding), can be enabled from Clang with -mllvm -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -mllvm -force-tail-folding-style=data-with-evl. It initially landed in #76172 but it isn't enabled by default yet due to support for it not being fully complete, both in the loop vectorizer and elsewhere in the RISC-V backend.

This issue aims to track what work is needed across the LLVM project to bring it up to a stable state, at which point we can evaluate its performance to see if it should be enabled by default.

It's not a complete list and only contains the tasks that I've noticed so far. Please feel free to edit and add to it!
I presume we will find more things that need addressed as time goes on.

Sub-issues

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions