Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add smart-peripheral if, im2col acc, and Verifheep flow #581

Merged
merged 243 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
243 commits
Select commit Hold shift + click to select a range
bedfda5
2D DMA with strides & without padding
Apr 22, 2024
518cc70
DMA stride computation fixes
Apr 23, 2024
e17d809
Started padding support, added output strides
Apr 24, 2024
2a88ec2
Added padding support & performance estimation
Apr 29, 2024
89dd815
First stable point: padding & performance estimation working
Apr 30, 2024
ab70937
Started development DMA HAL
May 1, 2024
0d696e2
Optimized register sizes & continued development of DMA HALs
May 2, 2024
72b2334
Development of DMA HALs & new example_2d_dma
May 6, 2024
fecb4a0
Added LLs example in example_2d_dma
May 6, 2024
671b775
Fixed non-word padding issue caused by read_ptr_valid logic
May 8, 2024
505fa38
Merge remote-tracking branch 'origin/main' into DMA_2D_smart
May 8, 2024
10c6635
Added out of bounds check 2D
May 8, 2024
1b6daba
Clean up for PR review
May 15, 2024
8b539bb
Merge branch 'main' into DMA_2D_smart
May 15, 2024
69f03a7
Added transposition & final cleanup
May 17, 2024
89689ac
First commit. core_v_mini_mcu_pkg.sv.tpl modificaitons
May 17, 2024
78ba652
Reduce test data size
May 21, 2024
48e4b81
Clean up
TommiTerza May 21, 2024
aa7cc98
Developed dma subsystem sv, HALs & example
May 21, 2024
b48afe5
Mini clean up
TommiTerza May 22, 2024
4bb0c19
second mini clean up
TommiTerza May 22, 2024
96ad87b
Added multichannel interrupt handling & new test on 4 channels
May 22, 2024
d552130
Merge branch 'esl-epfl:main' into DMA_multiCH
TommiTerza May 23, 2024
eafd82c
Merge branch 'DMA_2D_smart' into DMA_multiCH
May 23, 2024
9800b63
First development of the smart DMA
May 23, 2024
bcbf6a1
Small example_DMA fix
May 23, 2024
754af46
to squash
May 23, 2024
b20b6d0
Fixes
May 23, 2024
a1d863e
to be squashed
May 23, 2024
bd12dba
PR fixes
May 23, 2024
e42ecf4
fixes
May 23, 2024
06d0c56
Started doc
May 23, 2024
a1ae48e
Merge branch 'DMA_multiCH' into DMA_smart
May 23, 2024
86b5900
Started development of AOPB
May 23, 2024
322d2b9
Modified .tpl to expose external AOPB ports
May 24, 2024
b23586c
Fixes on example_2d
May 24, 2024
a75f5bc
Fixes example_2d
May 24, 2024
fd42174
Development AOPB FIFOs
May 24, 2024
2e56402
Fixed ao_peripheral sv & started development im2col in C using 2D DMA
May 29, 2024
f2b1a93
Added pseudo-rvalid logic to avoid mem accesses during padding, fixed…
May 31, 2024
b1abf01
Added pseudo-rvalid logic to avoid mem accesses during padding, fixed…
May 31, 2024
dc83d37
Fixed example_spi_write
Jun 3, 2024
66cad90
Merge branch 'DMA_2D_smart' into DMA_multiCH
Jun 3, 2024
0b54737
Development new test spi
Jun 3, 2024
6a9d8d2
Merge remote-tracking branch 'origin/main' into DMA_multiCH
Jun 3, 2024
aaa47e6
Fixed test spi & multich
Jun 4, 2024
f47a270
Merge branch 'main' into DMA_multiCH
Jun 4, 2024
d994571
small fixes
Jun 4, 2024
2c851c9
small fix
Jun 4, 2024
846afa7
fixes
Jun 4, 2024
eda0189
Modified dma gen
Jun 4, 2024
77a4306
fixes
Jun 5, 2024
795c787
Merge branch 'DMA_multiCH' into DMA_smart
Jun 5, 2024
e8e0703
Developed im2col w/ 2d DMA & fixes DMA HALs
Jun 5, 2024
25d4e21
Small fixes
Jun 5, 2024
de95b9c
Merge branch 'DMA_multiCH' into DMA_smart
Jun 5, 2024
9c3c0d9
small fix
Jun 5, 2024
119f0b2
Modified example & added window_ifr
Jun 6, 2024
c1278ef
Added window int case in example_dma_multichannel & DMA HAL fix
Jun 6, 2024
c48c2cc
fixes
Jun 6, 2024
c02f279
Fixes
Jun 6, 2024
ab761e7
Fixes
Jun 6, 2024
8015bbe
1 channel case fixes
Jun 7, 2024
72bf347
Work in progress optimized im2col
Jun 7, 2024
f1d271f
Work in progress optimized im2col 2
Jun 7, 2024
126ed5d
Started SV development of im2col_spc, added dma queue for multichanne…
Jun 10, 2024
e1e7bed
Added FSMs for parameter computations & loading transactions
Jun 11, 2024
f5e6dd2
Merge branch 'DMA_multiCH' into DMA_smart
Jun 12, 2024
268e65d
Started development im2col spc test in example_im2col
Jun 12, 2024
ee93bba
Continued dev
Jun 13, 2024
159407e
Fixes
Jun 13, 2024
b86f4c1
Cont'd
Jun 14, 2024
2dffbf3
Added optional priority interrupt mechanism
Jun 14, 2024
4164200
Cont'd
Jun 14, 2024
46bde0d
cont'd
Jun 18, 2024
5acf9dd
Merge branch 'DMA_multiCH' into DMA_smart
Jun 18, 2024
77bdf89
First stable point
Jun 19, 2024
3913b31
Fixed ch > 1 case, performance evaluation
Jun 19, 2024
cdb1d8c
Turned NUM_CH_SPC into register & fixes
Jun 20, 2024
7e42a41
Added verifHEEP.py script
Jun 20, 2024
d37ee68
Finished ver script & started im2col driver
Jun 21, 2024
0612ab9
Started HAL development
Jun 24, 2024
7b9388a
Started HAL development
Jun 24, 2024
9b94127
continued
Jun 24, 2024
ab01194
Modified contacts
Jun 25, 2024
f9dac3a
Merge branch 'main' into DMA_multiCH
Jun 25, 2024
f5e236d
Finished merging
Jun 25, 2024
28a440c
Fixes
Jun 25, 2024
98a52d7
Fixes
Jun 25, 2024
b588e08
Added plotter script for perf comparison
Jun 25, 2024
6d2974b
Development multimaster DMA subsystem
Jun 26, 2024
df6b367
Trying to fix freertos
Jun 27, 2024
c32b237
Trying to fix freertos 2
Jun 27, 2024
19c7b9e
Added multimaster xbar in dma subsystem
Jun 27, 2024
c6a134a
Merge branch 'main' into DMA_multiCH
Jun 28, 2024
0fd12a1
Fix w25q
Jun 28, 2024
c7470db
Fixed dma multichannel example
Jun 28, 2024
6f48ef5
added channel masks
Jun 28, 2024
2927feb
Added 4 channels in CI
Jun 28, 2024
027cc74
Fixed DMA SDK
Jul 1, 2024
4bdacd5
Merge branch 'main' into DMA_multiCH
Jul 1, 2024
af40dfb
Fixed power manager
Jul 1, 2024
21b3d97
Fixed channel masks system
Jul 1, 2024
0991d0f
Added individual DMA CH FIFO size control
Jul 1, 2024
277f9d9
Fixing examples for OpenHW compiler, to be continued
Jul 1, 2024
58786c5
Optimizing register sizes
Jul 2, 2024
06d3f94
Fixing example_dma...
Jul 2, 2024
0228462
Fixed example_dma
Jul 3, 2024
b7ef064
Optimizing
Jul 3, 2024
af958cb
Fixing multimaster configurations
Jul 5, 2024
a6d2780
Fixing
Jul 5, 2024
da28ee6
Fixes
TommiTerza Jul 8, 2024
7264e94
Merge branch 'main' into DMA_multiCH
TommiTerza Jul 8, 2024
18b656b
Added PLIC interrupt
TommiTerza Jul 9, 2024
b4249d3
Cleanup for merging
TommiTerza Jul 9, 2024
c615cb0
Cleanup for merging 2
TommiTerza Jul 9, 2024
dc7789a
Fixes
TommiTerza Jul 9, 2024
e878330
Removed useless files
TommiTerza Jul 9, 2024
9677de2
Decommented custom fifo sizes
TommiTerza Jul 9, 2024
7ba3556
fixes
TommiTerza Jul 9, 2024
b562b8a
Fixed a bug in the spc and improved verifHEEP.py
TommiTerza Jul 9, 2024
4866175
Merge remote-tracking branch 'origin/main' into DMA_multiCH
TommiTerza Jul 10, 2024
354e38b
Test changes
TommiTerza Jul 10, 2024
21946cc
Added individual external triggers for DMA channels & external dma_stop
TommiTerza Jul 10, 2024
c9a469d
Merge branch 'DMA_multiCH' of https://github.com/TommiTerza/x-heep in…
TommiTerza Jul 10, 2024
1ee6f80
Added OPT flag to test_all.sh to run questasim-sim-opt
TommiTerza Jul 10, 2024
c46a758
fix
TommiTerza Jul 10, 2024
d64cb31
fix
TommiTerza Jul 10, 2024
5ab018f
fix
TommiTerza Jul 10, 2024
e7be889
fix
TommiTerza Jul 10, 2024
b6c1013
fix
TommiTerza Jul 10, 2024
82e2ce1
Fixed trigger system for multiple channels
TommiTerza Jul 11, 2024
fcb8a82
fix
TommiTerza Jul 11, 2024
a9a7b39
Converted AOPB to reginterface
TommiTerza Jul 11, 2024
ba33d3f
Merge branch 'DMA_smart' of https://github.com/TommiTerza/x-heep into…
TommiTerza Jul 11, 2024
f8a39d3
mods
TommiTerza Jul 12, 2024
c575aa4
Fixes
TommiTerza Jul 12, 2024
a53a798
ext_dma_stop signal name update
TommiTerza Jul 12, 2024
d7bef7f
started dev verifHEEP.py on pynq-z2
TommiTerza Jul 12, 2024
7c83098
Merge branch 'DMA_multiCH' into DMA_smart
TommiTerza Jul 12, 2024
9ee10b1
Finalized merging & development verifHEEP_pynqz2.py
TommiTerza Jul 12, 2024
e8a3548
Added im2col spc support to xilinx_core_v_mini_mcu_wrapper
TommiTerza Jul 15, 2024
79c21d1
Fixes
TommiTerza Jul 15, 2024
3edfa4b
Fixes
TommiTerza Jul 15, 2024
d4467d9
Converted AOPB to reginterface
TommiTerza Jul 15, 2024
df96995
Fixing fpga testbench
TommiTerza Jul 16, 2024
c3d1089
Fixing timing issues with im2col spc
TommiTerza Jul 17, 2024
ec6dff0
Trying to reduce CP...
TommiTerza Jul 18, 2024
81a02dd
Still fixing...
TommiTerza Jul 18, 2024
7a257fc
Added pipeline registers is parameter computation
TommiTerza Jul 19, 2024
bb89fe7
fixes
TommiTerza Jul 22, 2024
f903e67
Optimizations
TommiTerza Jul 22, 2024
7efb3a4
Fixes to im2col fsm param logic
TommiTerza Jul 23, 2024
0b85027
Fixes
TommiTerza Jul 23, 2024
5246d50
fixes
TommiTerza Jul 23, 2024
09575e7
Fixed multiple im2col spc calls errors
TommiTerza Jul 25, 2024
1915536
Fixed multiple im2col spc calls errors
TommiTerza Jul 25, 2024
7c42e26
fixes
TommiTerza Jul 25, 2024
7d7b1ad
First stable version of verifHEEP!
TommiTerza Jul 26, 2024
780e0fc
Fix DMA padding
TommiTerza Jul 26, 2024
351889e
Merge branch 'main' into DMA_smart
TommiTerza Aug 6, 2024
b2fd280
mods
TommiTerza Aug 6, 2024
307d2c5
Merge branch 'DMA_smart' of https://github.com/TommiTerza/x-heep into…
TommiTerza Aug 6, 2024
5871b95
completed merging
TommiTerza Aug 6, 2024
88e8cbe
Rework of the DMA structure
TommiTerza Aug 8, 2024
64166fe
fixes
TommiTerza Aug 8, 2024
e03487d
fix
TommiTerza Aug 9, 2024
8d4f10d
gen new test for questasim sim
TommiTerza Aug 9, 2024
fb0b1e1
fix
TommiTerza Aug 9, 2024
5fe694d
Verifheep update
TommiTerza Aug 13, 2024
9df7d2d
Verifheep update 2
TommiTerza Aug 13, 2024
ff6385c
doc update
TommiTerza Aug 13, 2024
cf3dc9f
Merge branch 'main' into DMA_smart
TommiTerza Aug 13, 2024
1c1ce8c
Fix dma counters, removed byte units
TommiTerza Aug 14, 2024
1254394
Reorganized verifheep
TommiTerza Aug 19, 2024
7c58eb2
Forgot to add
TommiTerza Aug 19, 2024
589d4bd
fix
TommiTerza Aug 19, 2024
7a7a176
fix
TommiTerza Aug 19, 2024
8075143
fix
TommiTerza Aug 19, 2024
a465e2f
Added clock gating
TommiTerza Aug 20, 2024
3e37ea4
fix
TommiTerza Aug 21, 2024
bfc1b7e
fix
TommiTerza Aug 21, 2024
1ea0a69
Reran mcu-gen
TommiTerza Aug 21, 2024
688a94a
PR fix
TommiTerza Aug 25, 2024
4893ac9
fix
TommiTerza Aug 25, 2024
575ffde
fix
TommiTerza Aug 25, 2024
e9ebb52
fix
TommiTerza Aug 26, 2024
7227d53
fix
TommiTerza Aug 26, 2024
a2d4899
mcu gen small fix
TommiTerza Aug 26, 2024
13646ba
Trying to fix vendor
TommiTerza Aug 26, 2024
04627b5
Solved lintoff bug
TommiTerza Aug 26, 2024
1722ea3
Added chw - hwc conversion example
TommiTerza Aug 27, 2024
ca6cb09
Modified AO_SPC parameter
TommiTerza Sep 2, 2024
490a402
fix
TommiTerza Sep 2, 2024
8884fbf
Added im2col spc HAL & fixes
TommiTerza Sep 5, 2024
e5f7101
fix
TommiTerza Sep 5, 2024
e4ac6f1
fix
TommiTerza Sep 5, 2024
bfa8137
Merge branch 'esl-epfl:main' into DMA_smart
TommiTerza Sep 5, 2024
84b01b6
fix
TommiTerza Sep 6, 2024
a3b660d
Merge branch 'DMA_smart' of https://github.com/TommiTerza/x-heep into…
TommiTerza Sep 6, 2024
c6d2edc
Merge branch 'esl-epfl:main' into DMA_smart
TommiTerza Sep 9, 2024
e89f742
Use arrays
LuigiGiuffrida98 Sep 9, 2024
22064a9
Fix typo
LuigiGiuffrida98 Sep 9, 2024
11f3e91
Removed im2col spc dependencies from rtl-fpga target
TommiTerza Sep 9, 2024
527438a
v1 modifications to im2col
TommiTerza Sep 9, 2024
f2bf5bd
Fixes
TommiTerza Sep 9, 2024
86a5ad4
Fixes
TommiTerza Sep 10, 2024
cf4b81e
fix
TommiTerza Sep 10, 2024
e4ba80e
Updated the im2col verification scritp
TommiTerza Sep 10, 2024
a7e17bd
fix
TommiTerza Sep 11, 2024
a3803fe
Solved bug in dma padding fsm
TommiTerza Sep 11, 2024
f13b1d9
reduced im2col test lenght for CI
TommiTerza Sep 11, 2024
cec7445
im2col spc bugs fixes
TommiTerza Sep 16, 2024
a9365c8
fixes
TommiTerza Sep 16, 2024
a5dba92
added multi datatyep im2col
TommiTerza Sep 17, 2024
5441884
fix
TommiTerza Sep 17, 2024
cde62c0
fix
TommiTerza Sep 18, 2024
f40754e
fix
TommiTerza Sep 18, 2024
fb9fa99
fix
TommiTerza Sep 18, 2024
f969630
Merge branch 'DMA_smart' of https://github.com/TommiTerza/x-heep into…
TommiTerza Sep 18, 2024
091a55d
fix
TommiTerza Sep 18, 2024
fff2aac
fix
TommiTerza Sep 20, 2024
708d732
Merge remote-tracking branch 'origin/main' into DMA_smart
TommiTerza Sep 20, 2024
b8402e1
fix
TommiTerza Sep 20, 2024
019800a
fix
TommiTerza Sep 20, 2024
0bc9487
fix power manager CI
TommiTerza Sep 23, 2024
a28ff8c
fix
TommiTerza Sep 23, 2024
2021d4b
final fixes
TommiTerza Sep 24, 2024
0160d48
doc fix
TommiTerza Sep 27, 2024
2ecdc63
doc fix
TommiTerza Sep 27, 2024
fccb2b0
Merge branch 'esl-epfl:main' into DMA_smart
TommiTerza Sep 27, 2024
93ad3a8
fix
TommiTerza Sep 27, 2024
b196ec0
fix
TommiTerza Sep 27, 2024
cb13cc3
fix
TommiTerza Sep 27, 2024
1412a12
fix
TommiTerza Sep 27, 2024
0f66fba
fix
TommiTerza Sep 27, 2024
ee42b13
Merge branch 'esl-epfl:main' into DMA_smart
TommiTerza Sep 27, 2024
7cef8a0
fix
TommiTerza Sep 27, 2024
fb41c1b
Merge branch 'DMA_smart' of https://github.com/TommiTerza/x-heep into…
TommiTerza Sep 27, 2024
31d4eb0
fix
TommiTerza Sep 28, 2024
bbef477
fix
TommiTerza Sep 30, 2024
f0cce86
fix
TommiTerza Oct 1, 2024
581b35b
Fix typo
LuigiGiuffrida98 Oct 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Still fixing...
  • Loading branch information
TommiTerza committed Jul 18, 2024
commit 81a02dd22f7c7d54022a839a6e59ca81584dd602
66 changes: 58 additions & 8 deletions docs/source/Peripherals/DMA.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,71 @@

# DMA

## Introduction

The **Direct Memory Access (DMA)** peripheral allows data transfers with reduced CPU interaction.

It can perform *transactions* of data between peripherals and memory, or memory-to-memory (as a `memcpy` would).

The CPU is required to configure the transaction, but once launched it is free to go to sleep, process the incoming data or do anything else.


This unit is capable of performing complex tasks that can significantly impact on the performance and power consumption of memory-intense applications.
It can be configured to perform *1D* or *2D* transactions and it can apply **zero padding** and perform **transpositions** on-the-fly, reducing the overhead of matrix operations.

The DMA **Hardware Abstraction Layer (HAL)** facilitates the configuration of transactions from the users application. Furthermore, it adds an additional layer of safety checks to reduce the risk of faulty memory accesses, data override or infinite loops.



## Structural description

INSERIRE SCHEMATICO DMA

#### DMA channels layout

The DMA subsystem is composed of a parametric number of control units called *channels*.
Each channel can be configured, by the CPU or by an external controller, to perform a *transaction*, independently from the state of other channels.

N-channels are connected to a N-to-M bus that exposes M-master ports on the system bus. Multiple channels can thus perform multiple transactions in parallel, a feature that enables memory-intense applications to greatly increase their throuhput.

There are several possible ways to connect N-channels to the system bus through M-master ports.
e.g.
N = 4, M = 2

Two possible solutions:
- CH0, CH1 connected to port 0 & CH2, CH3 connected to port 1
- CH0, CH1, CH2 connected to port 0 & CH3 connected to port 1

In order to specify one among these configurations, the user has to set the `num_channels_per_master_port` parameter in `mcu_cfg.hjson`, which defines the _maximum_ channels per master port ratio.

The first configuration of the previous example has 2 channels per master port, so a channels per master port ratio of 2.
On the other hand, the second solution has a ratio of 3: the first 3 channels are connected to port 0, while the remaining channel is connected to the remaining port 1.

While the 1st solution is a general purpose, balanced configuration, the 2nd solution might be better suited for applications that need a low latency channel for high priority tasks.

This mechanism guarantees maximum flexibility, enabling the user to adapt the DMA subsystem to its requirements, both in terms of area and performance.

#### Data FIFOs configuration

Each DMA channel uses a FIFO to buffer the data to be written, which is crucial for mitigating the combined delays from the system bus and the DMA subsystem bus.
The size of this FIFO is parametric and is, by default, the same across all channels.

Some applications can benefit from a larger FIFO because it allows for more values to be buffered in situations where the bus is heavily utilized or the target peripheral, such as the SPI, is too slow.
On the other hand, other applications do not require a large FIFO and can save area by reducing its size.
A hybrid system, where some channels have large FIFO sizes and others have smaller ones, could benefit both these types of applications.

It is possible to specify the size of each DMA channel FIFO in `dma_subsystem.sv`. These are the steps to follow to take advantage this feature:

- Uncomment `//define EN_SET_FIFO_CH_SIZE;` to enable the mechanism
- Adjust the parameters _L_, _M_ and _S_. They define the size of a large, medium and small FIFO.
- Modify the parameter `typedef enum {L, M, S} fifo_ch_size_t;` to assign individual sizes to the FIFOs. The number of elements must reflect the number of DMA channels.

#### Triggers

The DMA can be used to perform either memory-memory or peripheral-memory operations.
In this last case, it's very common that the peripheral has a reacting time which can't sustain the system clock.
For example, the SPI trasmits data with a period of 30 clock cycles.
This difference in response time creates the need for a communication channel between DMA subsystem and peripheral which could suspend the DMA operations according to the peripheral state. These signals are called _triggers_.

They can be used both when the peripheral writes data using the DMA and when the DMA reads data from the peripheral.
The DMA can be configured to consider triggers by enabling the correct _slot_ in SW, using the DMA HAL.

## Previous Definitions

Expand All @@ -22,16 +75,13 @@ The implementation of this software layer introduced some concepts that need to

### Transaction

A transaction is an operation to be performed by the DMA. It implies copying bytes from a source pointer into a destination pointer. The transaction configuration can be loaded into the DMA registers once it has been cross-checked and it only starts when the size along the *first dimension* of the transaction is written in its corresponding register. The transaction is finished once the DMA has sent all its bytes (which not necessarily means they have been received by the final destination).

Transactions cannot be stopped once they were launched.
A transaction is an operation to be performed by the DMA. It implies copying bytes from a source pointer into a destination pointer. The transaction configuration can be loaded into the DMA registers once it has been cross-checked and it only starts when the size along the *first dimension* of the transaction is written in its corresponding register. The transaction is finished once the DMA has sent all its bytes (which not necessarily means they have been received by the final destination) or when the external stop signal is asserted.

While a transaction is running, new transactions can be validated, but not launched or loaded into the DMA.
While a transaction is running, new transactions can be validated, loaded and launched, but not into the same DMA channel.

Transactions can be re-launched automatically in `circular mode`.

Once the transaction has finished, a status bit is changed (that can be monitored through polling) and a fast interrupt is triggered.



### Source and destination
Expand Down
2 changes: 1 addition & 1 deletion hw/fpga/constraints/pynq-z2/constraints.xdc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
create_clock -add -name sys_clk_pin -period 8.00 -waveform {0 5} [get_ports {clk_i}];
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports {clk_i}];
7 changes: 1 addition & 6 deletions hw/ip/dma_subsystem/rtl/dma_NtoM_xbar.sv
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,7 @@ module dma_NtoM_xbar #(
import obi_pkg::*;
import core_v_mini_mcu_pkg::*;

initial begin
$display("Elemento 0 di DMA_XBAR_MASTERS: %0d", DMA_XBAR_MASTERS[0]);
$display("Elemento 1 di DMA_XBAR_MASTERS: %0d", DMA_XBAR_MASTERS[1]);
$display("Elemento 2 di DMA_XBAR_MASTERS: %0d", DMA_XBAR_MASTERS[2]);
end
// Generazione delle istanze xbar_varlat_n_to_one
/* Generation of the crossbars */
generate
xbar_varlat_n_to_one #(
.XBAR_NMASTER(core_v_mini_mcu_pkg::DMA_XBAR_MASTERS[0])
Expand Down
8 changes: 4 additions & 4 deletions hw/ip_examples/im2col_spc/rtl/im2col_spc.sv
Original file line number Diff line number Diff line change
Expand Up @@ -740,7 +740,7 @@ module im2col_spc
/* Free channel finder */
always_comb begin : proc_comb_free_channel
dma_free_channel = 0;
for (int i = 0; i < 32; i = i + 1) begin
for (int i = 0; i < DMA_CH_NUM; i = i + 1) begin
if (dma_if_channels[i] == 1'b0 && dma_ch_en_mask[i] == 1'b1) begin
dma_free_channel = i[(DMA_CH_NUM==1)?0 : ($clog2(DMA_CH_NUM)-1):0];
break;
Expand All @@ -752,20 +752,20 @@ module im2col_spc
always_ff @(posedge clk_i, negedge rst_ni) begin : proc_ff_control_unit
if (!rst_ni) begin
dma_trans_free_channel <= 0;
for (int i = 0; i < 32; i = i + 1) begin
for (int i = 0; i < DMA_CH_NUM; i = i + 1) begin
dma_if_channels[i] <= 1'b0;
dma_ch_first_write[i] <= 1'b0;
end
end else begin
/* Reset the first write flags when the im2col spc is done */
if (im2col_fsms_done == 1'b1) begin
for (int i = 0; i < 32; i = i + 1) begin
for (int i = 0; i < DMA_CH_NUM; i = i + 1) begin
dma_ch_first_write[i] <= 1'b0;
end
end

/* If an occupied channel asserts a done signal, free it up */
for (int i = 0; i < 32; i = i + 1) begin
for (int i = 0; i < DMA_CH_NUM; i = i + 1) begin
if (dma_if_channels[i] == 1'b1 && dma_done_i[i] == 1'b1) begin
dma_if_channels[i] <= 1'b0;
end
Expand Down
90 changes: 59 additions & 31 deletions hw/ip_examples/im2col_spc/rtl/im2col_spc_param_fsm.sv
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ module im2col_spc_param_fsm
enum {
IDLE,
ZEROS_COND_EVAL,
N_ZEROS_COMP,
N_ZEROS_COMP_1,
N_ZEROS_COMP_2,
OUT_PTR_UPDATE,
IM_OFFSET_UPDATE,
START_DMA_RUN
Expand Down Expand Up @@ -73,18 +74,15 @@ module im2col_spc_param_fsm
logic [31:0] right_zero_cond;
logic [31:0] bottom_zero_cond;
logic [31:0] n_zeros_left_std;
logic [31:0] n_zeros_left_1plus;
logic [31:0] n_zeros_top_std;
logic [31:0] n_zeros_top_1plus;
logic [31:0] n_zeros_right_std;
logic [31:0] n_zeros_right_1plus;
logic [31:0] n_zeros_bottom_std;
logic [31:0] n_zeros_bottom_1plus;
logic output_data_ptr_en;
logic im_offset_en;
logic batch_inc_en;
logic batch_rst;
logic zeros_en;
logic zeros_phase1_en;
logic zeros_phase2_en;
logic zeros_rst;
logic zeros_eval_en;
logic zeros_eval_rst;
Expand Down Expand Up @@ -132,7 +130,8 @@ module im2col_spc_param_fsm
batch_rst = 1'b1;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b0;
zeros_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b1;
zeros_eval_en = 1'b0;
zeros_eval_rst = 1'b1;
Expand All @@ -150,29 +149,46 @@ module im2col_spc_param_fsm
batch_rst = 1'b0;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b0;
zeros_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b0;
zeros_eval_en = 1'b1;
zeros_eval_rst = 1'b0;
output_data_ptr_rst = 1'b0;
param_state_q = N_ZEROS_COMP;
param_state_q = N_ZEROS_COMP_1;
end

N_ZEROS_COMP: begin
N_ZEROS_COMP_1: begin
fifo_push = 1'b0;
batch_inc_en = 1'b0;
batch_rst = 1'b0;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b0;
zeros_en = 1'b1;
zeros_phase1_en = 1'b1;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b0;
output_data_ptr_rst = 1'b0;
zeros_eval_en = 1'b0;
zeros_eval_rst = 1'b0;
param_state_q = N_ZEROS_COMP_2;
end

N_ZEROS_COMP_2: begin
fifo_push = 1'b0;
batch_inc_en = 1'b0;
batch_rst = 1'b0;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b1;
zeros_rst = 1'b0;
output_data_ptr_rst = 1'b0;
zeros_eval_en = 1'b0;
zeros_eval_rst = 1'b0;
if (fifo_full == 1'b0) begin
param_state_q = START_DMA_RUN;
end else begin
param_state_q = N_ZEROS_COMP;
param_state_q = N_ZEROS_COMP_2;
end
end

Expand All @@ -182,7 +198,8 @@ module im2col_spc_param_fsm
batch_rst = 1'b0;
output_data_ptr_en = 1'b1;
im_offset_en = 1'b0;
zeros_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b0;
output_data_ptr_rst = 1'b0;
zeros_eval_en = 1'b0;
Expand All @@ -200,7 +217,8 @@ module im2col_spc_param_fsm
batch_rst = 1'b1;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b1;
zeros_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b0;
output_data_ptr_rst = 1'b0;
zeros_eval_en = 1'b0;
Expand All @@ -218,7 +236,8 @@ module im2col_spc_param_fsm
batch_rst = 1'b0;
output_data_ptr_en = 1'b0;
im_offset_en = 1'b0;
zeros_en = 1'b0;
zeros_phase1_en = 1'b0;
zeros_phase2_en = 1'b0;
zeros_rst = 1'b0;
output_data_ptr_rst = 1'b0;
zeros_eval_en = 1'b0;
Expand All @@ -229,48 +248,63 @@ module im2col_spc_param_fsm
end

/* Number of zeros computation */
always_ff @(posedge clk_i, negedge rst_ni) begin : proc_ff_zeros_comp
always_ff @(posedge clk_i, negedge rst_ni) begin : proc_ff_zeros_phase1_comp
if (!rst_ni) begin
fw_min_w_offset <= '0;
fh_min_h_offset <= '0;
end else begin
if (zeros_phase1_en == 1'b1) begin
fw_min_w_offset <= {24'h0, reg2hw.fw.q} - 1 - {16'h0, w_offset};
fh_min_h_offset <= {24'h0, reg2hw.fh.q} - 1 - {16'h0, h_offset};
end else if (zeros_rst == 1'b1) begin
fw_min_w_offset <= '0;
fh_min_h_offset <= '0;
end
end
end

always_ff @(posedge clk_i, negedge rst_ni) begin : proc_ff_zeros_phase2_comp
if (!rst_ni) begin
n_zeros_left <= '0;
n_zeros_right <= '0;
n_zeros_top <= '0;
n_zeros_bottom <= '0;
end else begin
if (zeros_en == 1'b1) begin
if (zeros_phase2_en == 1'b1) begin
/* Left zeros computation */
if (w_offset >= {10'h0, reg2hw.pad_left.q}) begin
n_zeros_left <= 0;
end else if (left_zero_cond == 0) begin
end else if (|left_zero_cond == 1'b0) begin
n_zeros_left <= n_zeros_left_std; // n_zeros_left = LEFT_PAD - w_offset;
end else begin
n_zeros_left <= n_zeros_left_1plus;
n_zeros_left <= n_zeros_left_std + 1;
end

/* Top zeros computation */
if (h_offset >= {10'h0, reg2hw.pad_top.q}) begin
n_zeros_top <= 0;
end else if (top_zero_cond == 0) begin
end else if (|top_zero_cond == 1'b0) begin
n_zeros_top <= n_zeros_top_std; // n_zeros_top = TOP_PAD - h_offset;
end else begin
n_zeros_top <= n_zeros_top_1plus;
n_zeros_top <= n_zeros_top_std + 1;
end

/* Right zeros computation */
if (fw_min_w_offset >= reg2hw.pad_right.q || reg2hw.adpt_pad_right.q == 0) begin
n_zeros_right <= 0;
end else if (right_zero_cond == 0) begin
end else if (|right_zero_cond == 1'b0) begin
n_zeros_right <= n_zeros_right_std;
end else begin
n_zeros_right <= n_zeros_right_1plus;
n_zeros_right <= n_zeros_right_std + 1;
end

/* Bottom zeros computation */
if (fh_min_h_offset >= reg2hw.pad_bottom.q || reg2hw.adpt_pad_bottom.q == 0) begin
n_zeros_bottom <= 0;
end else if (bottom_zero_cond == 0) begin
end else if (|bottom_zero_cond == 1'b0) begin
n_zeros_bottom <= n_zeros_bottom_std;
end else begin
n_zeros_bottom <= n_zeros_bottom_1plus;
n_zeros_bottom <= n_zeros_bottom_std + 1;
end
end else if (zeros_rst == 1'b1) begin
n_zeros_left <= '0;
Expand Down Expand Up @@ -402,18 +436,12 @@ module im2col_spc_param_fsm

/* Signal assignments */

assign fw_min_w_offset = {24'h0, reg2hw.fw.q} - 1 - {16'h0, w_offset}; // fw_minus_w_offset = FW - 1 - w_offset;
assign fh_min_h_offset = {24'h0, reg2hw.fh.q} - 1 - {16'h0, h_offset};
assign im_row = {16'h0, h_offset} - {26'h0, reg2hw.pad_top.q}; // im_row = h_offset - TOP_PAD;
assign im_col = {16'h0, w_offset} - {26'h0, reg2hw.pad_left.q}; // im_col = w_offset - LEFT_PAD;
assign n_zeros_left_std = ({26'h0, reg2hw.pad_left.q} - {16'h0, w_offset}) / {24'h0, reg2hw.strides_d1.q};
assign n_zeros_left_1plus = n_zeros_left_std + 1;
assign n_zeros_top_std = ({26'h0, reg2hw.pad_top.q} - {16'h0, h_offset}) / {24'h0, reg2hw.strides_d2.q};
assign n_zeros_top_1plus = n_zeros_top_std + 1;
assign n_zeros_right_std = (reg2hw.adpt_pad_right.q - fw_min_w_offset) / {24'h0, reg2hw.strides_d1.q};
assign n_zeros_right_1plus = n_zeros_right_std + 1;
assign n_zeros_bottom_std = (reg2hw.adpt_pad_bottom.q - fh_min_h_offset) / {24'h0, reg2hw.strides_d2.q};
assign n_zeros_bottom_1plus = n_zeros_bottom_std + 1;
assign size_transfer_1d = {16'h0, reg2hw.n_patches_w.q} - n_zeros_left - n_zeros_right;
assign size_transfer_2d = {16'h0, reg2hw.n_patches_h.q} - n_zeros_top - n_zeros_bottom;
assign index = (({24'h0, batch_counter} * {24'h0, reg2hw.num_ch.q} + im_c) * reg2hw.ih.q + (im_row + n_zeros_top * {24'h0, reg2hw.strides_d2.q})) * reg2hw.iw.q + im_col + n_zeros_left * {24'h0, reg2hw.strides_d1.q};
Expand Down