question of vfredusum #294

AD738560581 · 2024-03-20T03:30:58Z

Hello@mp-17 @suehtamacv
When I use the following case to test the vfredusum instruction, I found that the RTL result is 40a81878, while the spike result is 40a81879. Howerver, when I used a floating-point calculator to simulate the calculation process of the vfredusum instruction, I found that the result was the same as RTL. Does this mean that the nodes of tree addition in RTL need to maintain accuracy?
Thanks
VSET(4, e32, m1);
VLOAD_32(v11, 03fc001e6, 03fa01fff, 03fa01fff, 03fa01fff);
VLOAD_32(v14, 0x0, 0x0, 0x0, 0x0);
VLOAD_32(v1, 0x4, 0x4, 0x4, 0x4);
asm volatile("vfredusum.vs v1, v11, v14");

jin8495 · 2024-03-20T10:24:14Z

Hi, AD738560581.

It is because the Spike computes vfredusum with in-order addition, which is the same as vfredosum.

AD738560581 · 2024-03-21T06:01:22Z

Thanks~ Your answer completely solved my doubts. However, there is another problem. According to the spec of rvv1.0, "If no elements are active, no additions are performed, so the scalar in vs1[0] is simply copied to the destination register, without canonicalizing NaN values and without setting any exception flags", which means if v0 register is zero while vm=0, the vfredosum.vs v3, v1, v2, v0.t means v3[0] = v1[0]? However, in the below case, I found the v3[0] is zero. @mp-17 @suehtamacv @jin8495

VSET(4, e32, m1);
VLOAD_32(v1, 0x80000000, 0x80800000, 0x80000000, 0x80000000);
VLOAD_32(v2, 0x80000000, 0x80000000, 0x80000000, 0x80000000);
VLOAD_32(v3, 0x1, 0x2, 0x3, 0x4);
VLOAD_32(v0, 0xf, 0x0, 0x0, 0x0);
asm volatile("vfredosum.vs v3, v1, v2, v0.t");

AD738560581 · 2024-03-21T07:01:41Z

This seems to be an issue with -0.0 data. The vfredosum instruction adds vs1[0] to each element of vs2 separately. What's more, the masked element of vs2 will be replaced with +0.0， which will lead to -0.0 of vs1[0] add +0.0, and the result is +0.0. So, if there is no active element and the vs1[0] is -0.0, the result will be +0.0. The reason is that the no active element will be replaced with +0.0 in vmfpu.sv of ntr_val in default. I modify the ntr_val in VFREDU/OSUM， and give it with 0x8000_0000_8000_0000 with the EW32. The case will be passed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question of vfredusum #294

question of vfredusum #294

AD738560581 commented Mar 20, 2024

jin8495 commented Mar 20, 2024

AD738560581 commented Mar 21, 2024

AD738560581 commented Mar 21, 2024

question of vfredusum #294

question of vfredusum #294

Comments

AD738560581 commented Mar 20, 2024

jin8495 commented Mar 20, 2024

AD738560581 commented Mar 21, 2024

AD738560581 commented Mar 21, 2024