-
Notifications
You must be signed in to change notification settings - Fork 35
RISC‐V
Welcome to the Aidget RISC-V wiki!
一个简易的helloworld工程,可用于测试交叉编译链是否可用。
$ adb shell "./aidget_riscv"
Hello World!
memcpy的小实验,需要注意vsetvli
、load
、store
的指令v0.7和v1.0是不一样的,目前用的是v0.7。
$ adb shell "./aidget_riscv"
memcpy_test: 0 1 2 3 4 5 6 7 8 9
指令 | v0.7 | v1.0 | 备注 |
---|---|---|---|
vsetvli | vsetvli t0, a2, e8, m8 | vsetvli t0, a2, e8, m8, ta, ma | Vectors of 8b |
load | vlb.v v0, (a1) | vle8.v v0, (a1) | Load bytes |
store | vsb.v v0, (a3) | vse8.v v0, (a3) | Store bytes |
ta # Tail agnostic
tu # Tail undisturbed
ma # Mask agnostic
mu # Mask undisturbed
在 v0.9 之前,当未在 vsetvli 上指定这些标志时,它们默认为掩码未受干扰/尾部未受干扰
vsetvli t0, a2, e8
这个例子中,初见vsetvli
指令,a2是长度n。
- 1st,a2 = 10 --> t0 = 8 --> a2 = 2
- 2nd, a2 = 2 --> t0 = 2 --> a2 = 0 -->ret
SAXPY(Scalar Alpha X Plus Y)是一个在 Basic Linear Algebra Subprograms(BLAS)数据包中的函数,并且是一个并行向量处理机(vector processor)中常用的计算操作指令。
y=αx+y,其中α是标量,x和y矢量。
$ adb shell "./aidget_riscv"
saxpy_test: 2.1 4.2 6.3 8.4 10.5 12.6 14.7 16.8 18.9 21.0
vsetvli a4, a0, e32, m8
这个例子中,又见vsetvli
指令,vsetvli
使用m8
参数设置了每条指令处理8个连续的向量寄存器,a0是长度n。
n = 10 --> a0 = 10
a = 2.0 --> fa0 = 2.0
vsetvli a4, a0, e32, m8
# a4 = min(10,8) = 8
vlw.v v0, (a1)
# v0-v7 = x0-x7 next: v0-v7 = x8-...
sub a0, a0, a4
# a0 = a0 - a4 = 10 - 8 = 2
slli a4, a4, 2
# a4 = a4 << 2 = 8*4 = 32 # float占4个Byte
add a1, a1, a4
# a1 本指向 x0,现在指向 x8
vlw.v v8, (a2)
# y0-y7 load 到 v8-v15
vfmacc.vf v8, fa0, v0
# (v8-v15) = fa0 * (v0-v7) + (v8-v15)
vsw.v v8, (a2)
# store 到 y0
add a2, a2, a4
# a2本指向y0,现在指向y8
测试内存带宽的小脚本
$ adb shell "./aidget_riscv"
memory_bandwidth_test:
0: memcpy bandwidth (read and write)
AVG Method: MEMCPY Elapsed: 0.09464 MiB: 100.00000 Copy: 1056.639 MiB/s
AVG Method: DUMB Elapsed: 0.60153 MiB: 100.00000 Copy: 166.243 MiB/s
AVG Method: MCBLOCK Elapsed: 0.09546 MiB: 100.00000 Copy: 1047.582 MiB/s
1: flw bandwidth (read)
--> flw Memory read bandwidth is 2.732 GB/s
2: vlw bandwidth (read)
--> vlw[m8] Memory read bandwidth is 1.155 GB/s
--> vlw[m4] Memory read bandwidth is 1.276 GB/s
--> vlw[m2] Memory read bandwidth is 0.347 GB/s
--> vlw[m1] Memory read bandwidth is 0.320 GB/s