Skip to content

Commit 2924ef4

Browse files
committed
evaluation existed design and work-flow, fixed some bugs in software-TR/TC decision for each layer.
1 parent ee5ce94 commit 2924ef4

File tree

7 files changed

+71876
-4
lines changed

7 files changed

+71876
-4
lines changed

README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,31 @@ The performance comparison in the two cases is shown in the following table:
6969
|Performance(GOP/s) |25.98 |30.15 | 36.13 | 6.63 |11.81| 13.08|
7070
|Power Efficiency(GOP/s/W) | 4.20 | 6.02 | ? | ? |? | ?|
7171

72+
# New Evaulation
73+
just further test existed design, and given more details for other researchs. (2023.11.06) Vivado, Vivado HLS 2019.2.
74+
Linux app compiled with -static -lm, in release mode -O2 opt.
75+
76+
Platform:
77+
78+
EdgeBoard(ZU3EG): 1.2GHz A53 4 cores + 4GiB DDR4 + FPGA
79+
80+
| ID | DataType | hls_target_clk |Tn/Tm/Tr/Tc/II_CONV/II_POOL | DSP | BRAM | LUT | FF | Freq (MHz) | Dev |
81+
| --- | --- | --- | --- |--- | --- | --- | --- | --- |--- |
82+
| A |FT32 | 3.0| 4/28/26/32/3/3 | 259(72%) | 90.5(42%) | 31983(45%) | 57683(41%) | 200 |EdgeBoard(ZU3EG)|
83+
84+
| Performance | |
85+
| --- | --- |
86+
|CNN models |YOLO v2 |
87+
|Board | ZU3EG |
88+
|Acc-Clock(MHz) | 200 |
89+
|Precision | FT32 |
90+
|Power (cpu idle + static fpga + dynamic cpu & fpga, W) | 6.63 + 0.55 + 1.82|
91+
|Operations (GOP) |29.472 |
92+
|Latency* (s) | 2.255 |
93+
|Performance(GOP/s) |13.069 |
94+
|Power Efficiency(GOP/s/W) | 5.514 |
95+
*Latency did not include post-process stage (e.g., the last region layer and image saving procedure) in CPU. Power Efficiency only evaluates the static + dynamic power in FPGA & CPU. CPU power could be further improved to close useless module and bus.
96+
7297
# Result
7398
![image1](https://github.com/dhm2013724/yolov2_xilinx_fpga/blob/150MHzTn4Tm32Tr26Tc26Cin4Cout2/pynq/result2.jpg)
7499

hls/src_float32/bias.bin

42 KB
Binary file not shown.

hls/src_float32/yolov2_acc_sim.h

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,10 @@ void yolov2_hls_ps(network *net, float *input)
153153
// TR = MIN(output_h,TR);
154154
// TC = MIN(((OnChipIB_Width-l.size)/l.stride+1),Tc);
155155
// TC = MIN(output_w,TC);
156-
TC = MIN(((IB_HxW-l.size)/l.stride+1),output_w);
156+
157+
assert((IB_HxW/l.size)>=l.size);
158+
TC = MIN(((IB_HxW/l.size-l.size)/l.stride+1),output_w);
159+
TC = MIN(TrxTc, TC);
157160
TCol = (TC-1)*l.stride + l.size;
158161
TR = MIN(((IB_HxW/TCol-l.size)/l.stride+1),output_h);//keep Kernel_stride>=1
159162
TR = MIN(TR, TrxTc/TC);
@@ -197,7 +200,10 @@ void yolov2_hls_ps(network *net, float *input)
197200
// TC = MIN(((OnChipIB_Width-l.size)/l.stride+1),Tc);
198201
// TR = MIN(output_h,TR);
199202
// TC = MIN(output_w,TC);
200-
TC = MIN(((IB_HxW-l.size)/l.stride+1),output_w);
203+
204+
assert((IB_HxW/l.size)>=l.size);
205+
TC = MIN(((IB_HxW/l.size-l.size)/l.stride+1),output_w);
206+
TC = MIN(TrxTc, TC);
201207
TCol = (TC-1)*l.stride + l.size;
202208
TR = MIN(((IB_HxW/TCol-l.size)/l.stride+1),output_h);//keep Kernel_stride>=1
203209
TR = MIN(TR, TrxTc/TC);

hls/src_float32_fusion/bias.bin

42 KB
Binary file not shown.
42 KB
Binary file not shown.

0 commit comments

Comments
 (0)