Description
While collecting latency data for integer vector instructions on the Banana Pi BPI-F3 (SpacemiT X60, RVV 1.0), I noticed an unusual pattern I couldn’t explain.
For most instructions, latency scales linearly with LMUL. However, the following instructions consistently exhibit a 1-cycle increase in latency only when LMUL = m4:
- vadd.vi, vadd.vv, vadd.vx
- vmacc.vv, vmacc.vx, vmadd.vv, vmadd.vx
- vmaxu.vv, vmaxu.vx, vmax.vv, vmax.vx
- vminu.vv, vminu.vx, vmin.vv, vmin.vx
- vmulhsu.vv, vmulhsu.vx, vmulhu.vv, vmulhu.vx
- vmulh.vv, vmulh.vx, vmul.vv, vmul.vx
- vnmsac.vv, vnmsac.vx, vnmsub.vv, vnmsub.vx
- vrsub.vi, vrsub.vx, vsub.vv, vsub.vx
This behavior is consistent and reproducible. All other LMUL values behave as expected, and the effect does not depend on SEW.
I’ve uploaded the full latency dataset here for reference:
https://docs.google.com/spreadsheets/d/1u2LF8Uux0BS2_U9zJsG6DE1zUsoPxBd_zH31ze1313o/edit?gid=0#gid=0
If anyone has insight into what might be causing this, especially whether it’s a known hardware characteristic or an issue in measurement, I’d appreciate any thoughts.
Tagging @zqb-all in case you have any thoughts or documentation that might help clarify this.