-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question of vfredusum #294
Comments
Hi, AD738560581. It is because the Spike computes vfredusum with in-order addition, which is the same as vfredosum. |
Thanks~ Your answer completely solved my doubts. However, there is another problem. According to the spec of rvv1.0, "If no elements are active, no additions are performed, so the scalar in vs1[0] is simply copied to the destination register, without canonicalizing NaN values and without setting any exception flags", which means if v0 register is zero while vm=0, the vfredosum.vs v3, v1, v2, v0.t means v3[0] = v1[0]? However, in the below case, I found the v3[0] is zero. @mp-17 @suehtamacv @jin8495 VSET(4, e32, m1); |
This seems to be an issue with -0.0 data. The vfredosum instruction adds vs1[0] to each element of vs2 separately. What's more, the masked element of vs2 will be replaced with +0.0, which will lead to -0.0 of vs1[0] add +0.0, and the result is +0.0. So, if there is no active element and the vs1[0] is -0.0, the result will be +0.0. The reason is that the no active element will be replaced with +0.0 in vmfpu.sv of ntr_val in default. I modify the ntr_val in VFREDU/OSUM, and give it with 0x8000_0000_8000_0000 with the EW32. The case will be passed. |
Hello@mp-17 @suehtamacv
When I use the following case to test the vfredusum instruction, I found that the RTL result is 40a81878, while the spike result is 40a81879. Howerver, when I used a floating-point calculator to simulate the calculation process of the vfredusum instruction, I found that the result was the same as RTL. Does this mean that the nodes of tree addition in RTL need to maintain accuracy?
Thanks
VSET(4, e32, m1);
VLOAD_32(v11, 03fc001e6, 03fa01fff, 03fa01fff, 03fa01fff);
VLOAD_32(v14, 0x0, 0x0, 0x0, 0x0);
VLOAD_32(v1, 0x4, 0x4, 0x4, 0x4);
asm volatile("vfredusum.vs v1, v11, v14");
The text was updated successfully, but these errors were encountered: