Release v2.2.0 · pulp-platform/ara

Fix typo on the build instructions of the README
Fix Gnuplot installation on GitHub's CI
The number of elements requested by the Store Unit and the Element Requester now depends both on the requested eew and the past eew of the vector of the used register
When the VRF is written and EMUL > 1, the eew of all the interested registers is updated
Memory operations can change EMUL when EEW != VSEW
The LSU now correctly handles bursts with a saturated length of 256 beats
AXI transactions on an opposite channel w.r.t. the channel currently in use are started only after the completion of the previous transactions
Fix the number of elements to be requested for a vslidedown instruction

benchmarks app to benchmark Ara
CI task to create roofline plots of imatmul and fmatmul, available as artifacts
Vector floating-point compare instructions (vmfeq, vmfne, vmflt, vmfle, vmfgt, vmfge)
Vector single-width floating-point/integer type-convert instructions (vfcvt.xu.f, vfcvt.x.f, vfcvt.rtz.xu.f, vfcvt.rtz.x.f, vfcvt.f.xu, vfcvt.f.x)
Vector widening floating-point/integer type-convert instructions (vfwcvt.xu.f, vfwcvt.x.f, vfwcvt.rtz.xu.f, vfwcvt.rtz.x.f, vfwcvt.f.xu, vfwcvt.f.x, vfwcvt.f.f)
Vector narrowing floating-point/integer type-convert instructions (vfncvt.xu.f, vfncvt.x.f, vfncvt.rtz.xu.f, vfncvt.rtz.x.f, vfncvt.f.xu, vfncvt.f.x, vfncvt.f.f)
Vector whole-register move instruction vmv<nr>
Vector whole-register load/store vl1r, vs1r
Vector load/store mask vle1, vse1
Whole-register instructions are executed also if vtype.vl == 0
Makefile option (trace=1) to generate waveform traces when running simulations with Verilator

Add spill register at the lane edge, to cut the timing-critical interface between the Mask unit and the VFUs
Increase latency of the 16-bit multiplier from 0 to 1 to cut an in-lane timing-critical path
Widen CVA6's cache lines
Implement back-to-back accelerator instruction issue mechanism on CVA6
Use https protocol when cloning DTC from main Makefile
Use https protocol for newlib-cygwin in .gitmodules
Cut a timing-critical path from Addrgen to Sequencer (1 cycle more to start an AXI transaction)
Cut a timing-critical path in the VSTU, relative to the calculation of the pointer to the VRF word received from the lanes
Create ara_system wrapper containing Ara, Ariane, and an AXI mux, instantiated from within Ara's SoC
Retime address calculation of the addrgen
Push MASKU operand muxing from the lanes to the Mask Unit
Reduce CVA6's default cache size
Update Verilator to v4.214
Update bender to v0.23.1

Provide feedback