AXI4 Cache System for CXL Type 3 AFU

A unified AXI4 cache system designed for integration into Intel CXL Type 3 FPGA designs. The system provides a complete data path from an AXI4 slave interface (receiving requests from the CXL IP) through an address-routed cache subsystem to an AXI4 master interface (connecting to the memory controller).

Architecture Overview

graph LR
    %% Styling
    classDef ip fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    classDef user fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,stroke-dasharray: 5 5;
    classDef logic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
    classDef mem fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;

    %% External IPs
    CXL["CXL IP (T3PH)"]:::ip
    MC["Memory Controller (mc_top)"]:::mem

    %% Your System Wrapper
    subgraph AFU_System [AFU Cache System Wrapper]
        direction LR
        
        %% Components
        Adapter[AXI Cache Adapter]:::logic
        Cache[Cache Memory]:::logic
        BridgePT[Simple-to-AXI Bridge]:::logic
        BridgeMiss[Simple-to-AXI Bridge]:::logic
        Arbiter[2:1 AXI Interconnect]:::logic

        %% Internal Connections
        Adapter -- "Hit (Read)" --> Cache
        Cache -- "Data" --> Adapter
        
        Adapter -- "Passthrough (addr > BASE)" --> BridgePT
        Cache -- "Miss (Line Refill)" --> BridgeMiss
        
        BridgePT -- "AXI Master" --> Arbiter
        BridgeMiss -- "AXI Master" --> Arbiter
    end

    %% External Connections
    CXL --> |"AXI Slave (ip2hdm)"| Adapter
    Arbiter --> |"AXI Master (afu2mc)"| MC

Directory Structure

tlb/
├── hdl/                          # Synthesizable RTL
│   ├── afu_cache_system.sv       # Top-level wrapper module
│   ├── axi_cache_adapter.sv      # AXI4 slave with address-based routing
│   ├── cache.sv                  # 4-way set-associative cache
│   ├── ff_array.sv               # SRAM-style flip-flop array
│   ├── mem_arbiter_stub.sv       # 2:1 memory arbiter (simulation stub)
│   └── simple_to_axi_master.sv   # Simple protocol to AXI4 master bridge
│
├── hvl/                          # Verification
│   ├── testbench.sv              # Main testbench with AXI4 drivers
│   ├── axi_ram.sv                # AXI4 slave memory wrapper
│   ├── memory.sv                 # Simple memory model
│   └── axi_crossbar/             # AXI4 crossbar IP (reference)
│
├── sim_afu_cache.sh              # GUI simulation script
├── sim_afu_cache_cli.sh          # Command-line simulation script
└── sim_cache.sh                  # Legacy simulation script

Module Descriptions

afu_cache_system

Top-level wrapper that instantiates and connects all subsystem components. Presents clean AXI4 interfaces for integration into larger systems.

Parameters:

Parameter	Default	Description
`C_AXI_DATA_WIDTH`	512	AXI data bus width in bits
`C_AXI_ADDR_WIDTH`	64	AXI address bus width in bits
`C_AXI_ID_WIDTH`	8	AXI transaction ID width in bits
`PASSTHROUGH_BASE`	`0x0000_0002_0000_0000`	Address routing threshold (8GB)

Interfaces:

s_axi_* - AXI4 slave interface (input from CXL IP or testbench)
m_axi_* - AXI4 master interface (output to memory controller)

axi_cache_adapter

Converts AXI4 transactions to a simple request/response protocol and routes requests based on address.

Address Routing:

Addresses below PASSTHROUGH_BASE: Routed to cache (cached memory region)
Addresses at or above PASSTHROUGH_BASE: Routed to passthrough (uncached region)

AXI4 Features:

512-bit single-beat transactions (no burst assembly required)
8-bit transaction ID pass-through (AWID returns as BID, ARID returns as RID)
Separate write and read state machines with write priority on memory port

Output Interfaces:

cache_mem_* - Simple protocol to cache module
pt_mem_* - Simple protocol to passthrough path

cache

4-way set-associative write-back cache with pseudo-LRU replacement policy.

Specifications:

16 sets, 4 ways per set
64-byte cache lines (512 bits)
55-bit tags (for 64-bit address space)
Write-back policy with dirty bit tracking
Pseudo-LRU replacement (3-bit tree per set)

Interfaces:

mem_* - CPU/adapter side (512-bit data)
pmem_* - Physical memory side (to arbiter)

mem_arbiter_stub

Simple 2:1 memory arbiter that serializes requests from the cache miss path and passthrough path onto a single output.

Arbitration Policy: Fixed priority

Port 0 (cache miss): Higher priority
Port 1 (passthrough): Lower priority

This is a behavioral stub for simulation. For synthesis, replace with an Intel AXI Interconnect generated from Platform Designer.

simple_to_axi_master

Converts the internal simple request/response protocol to AXI4 master transactions.

Protocol Conversion:

mem_read / mem_write strobes initiate AXI transactions
Single-beat 512-bit transfers (AWLEN/ARLEN = 0)
AWSIZE/ARSIZE = 6 (64-byte transfers)
Sequential transaction ID generation for downstream tracking
mem_resp asserts when AXI transaction completes

axi_ram (Verification)

AXI4 slave wrapper around the simple memory model for simulation testing.

Purpose: Provides proper AXI4 handshaking to validate the simple_to_axi_master bridge logic. Ensures the DUT speaks real AXI4 end-to-end.

Signal Interfaces

Simple Memory Protocol

Used internally between modules. Not AXI-compliant but simple for RTL implementation.

Signal	Width	Direction	Description
`address`	64	Output	Byte-aligned address
`read`	1	Output	Read request strobe
`write`	1	Output	Write request strobe
`wdata`	512	Output	Write data
`rdata`	512	Input	Read data (valid with `resp`)
`resp`	1	Input	Response strobe (single cycle)

AXI4 Interface

Standard AXI4 signals with the following configuration:

Property	Value
Data Width	512 bits
Address Width	64 bits
ID Width	8 bits
Burst Type	Single-beat (`LEN` = 0)
Burst Size	64 bytes (`SIZE` = 6)

Running Simulation

Prerequisites

ModelSim (tested with Intel FPGA Edition 2020.1)
ModelSim binary path configured in simulation scripts

Command-Line Simulation

cd /path/to/tlb
./sim_afu_cache_cli.sh

Compiles all modules and runs the testbench to completion, printing results to stdout.

GUI Simulation

cd /path/to/tlb
./sim_afu_cache.sh

Launches ModelSim with waveform viewer for interactive debugging.

Test Coverage

The testbench executes the following test sequences:

Test	Description
`single_read`	Basic cache miss and load
`single_read_hit`	Cache hit verification
`multiple_reads`	Sequential reads across cache lines
`single_write`	Basic write operation
`write_then_read`	Write followed by read verification
`way_associativity`	Fill all 4 ways of a set
`striped_writes`	Multiple writes followed by reads
`plru_simple`	PLRU eviction policy verification
`random_short`	20 random read/write operations
`flush_cache`	Force writebacks by reading new addresses
`memory_consistency`	Compare DUT memory against golden model

All tests compare results against a golden memory model to verify correctness.

CXL Integration Notes

Target Environment

This system is designed for integration into the AFU (Accelerator Functional Unit) of an Intel CXL Type 3 design example. The AFU sits between:

Upstream: CXL IP (T3PH) providing AXI4 requests from the host
Downstream: Memory controller (mc_top) managing DDR4

Signal Naming Conventions

When integrating, rename signals to match Intel CXL IP conventions:

This Project	Intel CXL Convention
`s_axi_*`	`ip2hdm_*` (IP to HDM)
`m_axi_*`	`afu2mc_*` (AFU to MC)

Synthesis Considerations

Arbiter Replacement: Replace mem_arbiter_stub with an Intel AXI Interconnect generated from Platform Designer. Generate HDL with simulation files enabled to maintain consistent behavior between simulation and synthesis.
Timing Closure: The current design prioritizes simplicity over performance. Add pipeline stages if timing closure requires it.
Address Map: Adjust PASSTHROUGH_BASE to match the actual FPGA memory map. For a 16GB device, typical values might be:
- Cached region: 0x0 to 0x1_FFFF_FFFF (8GB)
- Passthrough region: 0x2_0000_0000 to 0x3_FFFF_FFFF (8GB)

Design Constraints

Cache expects 64-byte aligned addresses (lower 6 address bits are ignored)
Single outstanding transaction per path (no pipelining)
Write operations take priority over reads when contending for memory port
AXI transaction IDs are passed through, not reordered

File History

File	Status	Notes
`axi_cache_adapter.sv`	Modified	Updated to 512-bit, added AXI ID support
`axi_cache_adapter_old.sv`	Archived	Original 64-bit burst version
`testbench.sv`	Modified	Updated for 512-bit AXI4
`testbench_old.sv`	Removed	Original testbench

References

ARM AMBA AXI Protocol Specification (AXI4)
Intel CXL IP User Guide
Intel Agilex FPGA Design Guidelines

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hdl		hdl
hvl		hvl
rtl_work		rtl_work
work_afu		work_afu
README		README
README.md		README.md
modelsim.ini		modelsim.ini
new_sim.sh		new_sim.sh
sim2.sh		sim2.sh
sim_afu_cache.sh		sim_afu_cache.sh
sim_afu_cache_cli.sh		sim_afu_cache_cli.sh
sim_cache.sh		sim_cache.sh
transcript		transcript
vish_stacktrace.vstf		vish_stacktrace.vstf
vsim.wlf		vsim.wlf
vsim_stacktrace.vstf		vsim_stacktrace.vstf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AXI4 Cache System for CXL Type 3 AFU

Architecture Overview

Directory Structure

Module Descriptions

afu_cache_system

axi_cache_adapter

cache

mem_arbiter_stub

simple_to_axi_master

axi_ram (Verification)

Signal Interfaces

Simple Memory Protocol

AXI4 Interface

Running Simulation

Prerequisites

Command-Line Simulation

GUI Simulation

Test Coverage

CXL Integration Notes

Target Environment

Signal Naming Conventions

Synthesis Considerations

Design Constraints

File History

References

About

Uh oh!

Releases

Packages

Languages

Neil-Rayu/CXL_Cache_TLB

Folders and files

Latest commit

History

Repository files navigation

AXI4 Cache System for CXL Type 3 AFU

Architecture Overview

Directory Structure

Module Descriptions

afu_cache_system

axi_cache_adapter

cache

mem_arbiter_stub

simple_to_axi_master

axi_ram (Verification)

Signal Interfaces

Simple Memory Protocol

AXI4 Interface

Running Simulation

Prerequisites

Command-Line Simulation

GUI Simulation

Test Coverage

CXL Integration Notes

Target Environment

Signal Naming Conventions

Synthesis Considerations

Design Constraints

File History

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages