A unified AXI4 cache system designed for integration into Intel CXL Type 3 FPGA designs. The system provides a complete data path from an AXI4 slave interface (receiving requests from the CXL IP) through an address-routed cache subsystem to an AXI4 master interface (connecting to the memory controller).
graph LR
%% Styling
classDef ip fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
classDef user fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,stroke-dasharray: 5 5;
classDef logic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
classDef mem fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;
%% External IPs
CXL["CXL IP (T3PH)"]:::ip
MC["Memory Controller (mc_top)"]:::mem
%% Your System Wrapper
subgraph AFU_System [AFU Cache System Wrapper]
direction LR
%% Components
Adapter[AXI Cache Adapter]:::logic
Cache[Cache Memory]:::logic
BridgePT[Simple-to-AXI Bridge]:::logic
BridgeMiss[Simple-to-AXI Bridge]:::logic
Arbiter[2:1 AXI Interconnect]:::logic
%% Internal Connections
Adapter -- "Hit (Read)" --> Cache
Cache -- "Data" --> Adapter
Adapter -- "Passthrough (addr > BASE)" --> BridgePT
Cache -- "Miss (Line Refill)" --> BridgeMiss
BridgePT -- "AXI Master" --> Arbiter
BridgeMiss -- "AXI Master" --> Arbiter
end
%% External Connections
CXL --> |"AXI Slave (ip2hdm)"| Adapter
Arbiter --> |"AXI Master (afu2mc)"| MC
tlb/
├── hdl/ # Synthesizable RTL
│ ├── afu_cache_system.sv # Top-level wrapper module
│ ├── axi_cache_adapter.sv # AXI4 slave with address-based routing
│ ├── cache.sv # 4-way set-associative cache
│ ├── ff_array.sv # SRAM-style flip-flop array
│ ├── mem_arbiter_stub.sv # 2:1 memory arbiter (simulation stub)
│ └── simple_to_axi_master.sv # Simple protocol to AXI4 master bridge
│
├── hvl/ # Verification
│ ├── testbench.sv # Main testbench with AXI4 drivers
│ ├── axi_ram.sv # AXI4 slave memory wrapper
│ ├── memory.sv # Simple memory model
│ └── axi_crossbar/ # AXI4 crossbar IP (reference)
│
├── sim_afu_cache.sh # GUI simulation script
├── sim_afu_cache_cli.sh # Command-line simulation script
└── sim_cache.sh # Legacy simulation script
Top-level wrapper that instantiates and connects all subsystem components. Presents clean AXI4 interfaces for integration into larger systems.
Parameters:
| Parameter | Default | Description |
|---|---|---|
C_AXI_DATA_WIDTH |
512 | AXI data bus width in bits |
C_AXI_ADDR_WIDTH |
64 | AXI address bus width in bits |
C_AXI_ID_WIDTH |
8 | AXI transaction ID width in bits |
PASSTHROUGH_BASE |
0x0000_0002_0000_0000 |
Address routing threshold (8GB) |
Interfaces:
s_axi_*- AXI4 slave interface (input from CXL IP or testbench)m_axi_*- AXI4 master interface (output to memory controller)
Converts AXI4 transactions to a simple request/response protocol and routes requests based on address.
Address Routing:
- Addresses below
PASSTHROUGH_BASE: Routed to cache (cached memory region) - Addresses at or above
PASSTHROUGH_BASE: Routed to passthrough (uncached region)
AXI4 Features:
- 512-bit single-beat transactions (no burst assembly required)
- 8-bit transaction ID pass-through (
AWIDreturns asBID,ARIDreturns asRID) - Separate write and read state machines with write priority on memory port
Output Interfaces:
cache_mem_*- Simple protocol to cache modulept_mem_*- Simple protocol to passthrough path
4-way set-associative write-back cache with pseudo-LRU replacement policy.
Specifications:
- 16 sets, 4 ways per set
- 64-byte cache lines (512 bits)
- 55-bit tags (for 64-bit address space)
- Write-back policy with dirty bit tracking
- Pseudo-LRU replacement (3-bit tree per set)
Interfaces:
mem_*- CPU/adapter side (512-bit data)pmem_*- Physical memory side (to arbiter)
Simple 2:1 memory arbiter that serializes requests from the cache miss path and passthrough path onto a single output.
Arbitration Policy: Fixed priority
- Port 0 (cache miss): Higher priority
- Port 1 (passthrough): Lower priority
This is a behavioral stub for simulation. For synthesis, replace with an Intel AXI Interconnect generated from Platform Designer.
Converts the internal simple request/response protocol to AXI4 master transactions.
Protocol Conversion:
mem_read/mem_writestrobes initiate AXI transactions- Single-beat 512-bit transfers (
AWLEN/ARLEN= 0) AWSIZE/ARSIZE= 6 (64-byte transfers)- Sequential transaction ID generation for downstream tracking
mem_respasserts when AXI transaction completes
AXI4 slave wrapper around the simple memory model for simulation testing.
Purpose: Provides proper AXI4 handshaking to validate the simple_to_axi_master bridge logic. Ensures the DUT speaks real AXI4 end-to-end.
Used internally between modules. Not AXI-compliant but simple for RTL implementation.
| Signal | Width | Direction | Description |
|---|---|---|---|
address |
64 | Output | Byte-aligned address |
read |
1 | Output | Read request strobe |
write |
1 | Output | Write request strobe |
wdata |
512 | Output | Write data |
rdata |
512 | Input | Read data (valid with resp) |
resp |
1 | Input | Response strobe (single cycle) |
Standard AXI4 signals with the following configuration:
| Property | Value |
|---|---|
| Data Width | 512 bits |
| Address Width | 64 bits |
| ID Width | 8 bits |
| Burst Type | Single-beat (LEN = 0) |
| Burst Size | 64 bytes (SIZE = 6) |
- ModelSim (tested with Intel FPGA Edition 2020.1)
- ModelSim binary path configured in simulation scripts
cd /path/to/tlb
./sim_afu_cache_cli.shCompiles all modules and runs the testbench to completion, printing results to stdout.
cd /path/to/tlb
./sim_afu_cache.shLaunches ModelSim with waveform viewer for interactive debugging.
The testbench executes the following test sequences:
| Test | Description |
|---|---|
single_read |
Basic cache miss and load |
single_read_hit |
Cache hit verification |
multiple_reads |
Sequential reads across cache lines |
single_write |
Basic write operation |
write_then_read |
Write followed by read verification |
way_associativity |
Fill all 4 ways of a set |
striped_writes |
Multiple writes followed by reads |
plru_simple |
PLRU eviction policy verification |
random_short |
20 random read/write operations |
flush_cache |
Force writebacks by reading new addresses |
memory_consistency |
Compare DUT memory against golden model |
All tests compare results against a golden memory model to verify correctness.
This system is designed for integration into the AFU (Accelerator Functional Unit) of an Intel CXL Type 3 design example. The AFU sits between:
- Upstream: CXL IP (T3PH) providing AXI4 requests from the host
- Downstream: Memory controller (
mc_top) managing DDR4
When integrating, rename signals to match Intel CXL IP conventions:
| This Project | Intel CXL Convention |
|---|---|
s_axi_* |
ip2hdm_* (IP to HDM) |
m_axi_* |
afu2mc_* (AFU to MC) |
-
Arbiter Replacement: Replace
mem_arbiter_stubwith an Intel AXI Interconnect generated from Platform Designer. Generate HDL with simulation files enabled to maintain consistent behavior between simulation and synthesis. -
Timing Closure: The current design prioritizes simplicity over performance. Add pipeline stages if timing closure requires it.
-
Address Map: Adjust
PASSTHROUGH_BASEto match the actual FPGA memory map. For a 16GB device, typical values might be:- Cached region:
0x0to0x1_FFFF_FFFF(8GB) - Passthrough region:
0x2_0000_0000to0x3_FFFF_FFFF(8GB)
- Cached region:
- Cache expects 64-byte aligned addresses (lower 6 address bits are ignored)
- Single outstanding transaction per path (no pipelining)
- Write operations take priority over reads when contending for memory port
- AXI transaction IDs are passed through, not reordered
| File | Status | Notes |
|---|---|---|
axi_cache_adapter.sv |
Modified | Updated to 512-bit, added AXI ID support |
axi_cache_adapter_old.sv |
Archived | Original 64-bit burst version |
testbench.sv |
Modified | Updated for 512-bit AXI4 |
testbench_old.sv |
Removed | Original testbench |
- ARM AMBA AXI Protocol Specification (AXI4)
- Intel CXL IP User Guide
- Intel Agilex FPGA Design Guidelines