Skip to content

Neil-Rayu/CXL_Cache_TLB

Repository files navigation

AXI4 Cache System for CXL Type 3 AFU

A unified AXI4 cache system designed for integration into Intel CXL Type 3 FPGA designs. The system provides a complete data path from an AXI4 slave interface (receiving requests from the CXL IP) through an address-routed cache subsystem to an AXI4 master interface (connecting to the memory controller).

Architecture Overview

graph LR
    %% Styling
    classDef ip fill:#e1f5fe,stroke:#01579b,stroke-width:2px;
    classDef user fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,stroke-dasharray: 5 5;
    classDef logic fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;
    classDef mem fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px;

    %% External IPs
    CXL["CXL IP (T3PH)"]:::ip
    MC["Memory Controller (mc_top)"]:::mem

    %% Your System Wrapper
    subgraph AFU_System [AFU Cache System Wrapper]
        direction LR
        
        %% Components
        Adapter[AXI Cache Adapter]:::logic
        Cache[Cache Memory]:::logic
        BridgePT[Simple-to-AXI Bridge]:::logic
        BridgeMiss[Simple-to-AXI Bridge]:::logic
        Arbiter[2:1 AXI Interconnect]:::logic

        %% Internal Connections
        Adapter -- "Hit (Read)" --> Cache
        Cache -- "Data" --> Adapter
        
        Adapter -- "Passthrough (addr > BASE)" --> BridgePT
        Cache -- "Miss (Line Refill)" --> BridgeMiss
        
        BridgePT -- "AXI Master" --> Arbiter
        BridgeMiss -- "AXI Master" --> Arbiter
    end

    %% External Connections
    CXL --> |"AXI Slave (ip2hdm)"| Adapter
    Arbiter --> |"AXI Master (afu2mc)"| MC
Loading

Directory Structure

tlb/
├── hdl/                          # Synthesizable RTL
│   ├── afu_cache_system.sv       # Top-level wrapper module
│   ├── axi_cache_adapter.sv      # AXI4 slave with address-based routing
│   ├── cache.sv                  # 4-way set-associative cache
│   ├── ff_array.sv               # SRAM-style flip-flop array
│   ├── mem_arbiter_stub.sv       # 2:1 memory arbiter (simulation stub)
│   └── simple_to_axi_master.sv   # Simple protocol to AXI4 master bridge
│
├── hvl/                          # Verification
│   ├── testbench.sv              # Main testbench with AXI4 drivers
│   ├── axi_ram.sv                # AXI4 slave memory wrapper
│   ├── memory.sv                 # Simple memory model
│   └── axi_crossbar/             # AXI4 crossbar IP (reference)
│
├── sim_afu_cache.sh              # GUI simulation script
├── sim_afu_cache_cli.sh          # Command-line simulation script
└── sim_cache.sh                  # Legacy simulation script

Module Descriptions

afu_cache_system

Top-level wrapper that instantiates and connects all subsystem components. Presents clean AXI4 interfaces for integration into larger systems.

Parameters:

Parameter Default Description
C_AXI_DATA_WIDTH 512 AXI data bus width in bits
C_AXI_ADDR_WIDTH 64 AXI address bus width in bits
C_AXI_ID_WIDTH 8 AXI transaction ID width in bits
PASSTHROUGH_BASE 0x0000_0002_0000_0000 Address routing threshold (8GB)

Interfaces:

  • s_axi_* - AXI4 slave interface (input from CXL IP or testbench)
  • m_axi_* - AXI4 master interface (output to memory controller)

axi_cache_adapter

Converts AXI4 transactions to a simple request/response protocol and routes requests based on address.

Address Routing:

  • Addresses below PASSTHROUGH_BASE: Routed to cache (cached memory region)
  • Addresses at or above PASSTHROUGH_BASE: Routed to passthrough (uncached region)

AXI4 Features:

  • 512-bit single-beat transactions (no burst assembly required)
  • 8-bit transaction ID pass-through (AWID returns as BID, ARID returns as RID)
  • Separate write and read state machines with write priority on memory port

Output Interfaces:

  • cache_mem_* - Simple protocol to cache module
  • pt_mem_* - Simple protocol to passthrough path

cache

4-way set-associative write-back cache with pseudo-LRU replacement policy.

Specifications:

  • 16 sets, 4 ways per set
  • 64-byte cache lines (512 bits)
  • 55-bit tags (for 64-bit address space)
  • Write-back policy with dirty bit tracking
  • Pseudo-LRU replacement (3-bit tree per set)

Interfaces:

  • mem_* - CPU/adapter side (512-bit data)
  • pmem_* - Physical memory side (to arbiter)

mem_arbiter_stub

Simple 2:1 memory arbiter that serializes requests from the cache miss path and passthrough path onto a single output.

Arbitration Policy: Fixed priority

  • Port 0 (cache miss): Higher priority
  • Port 1 (passthrough): Lower priority

This is a behavioral stub for simulation. For synthesis, replace with an Intel AXI Interconnect generated from Platform Designer.

simple_to_axi_master

Converts the internal simple request/response protocol to AXI4 master transactions.

Protocol Conversion:

  • mem_read / mem_write strobes initiate AXI transactions
  • Single-beat 512-bit transfers (AWLEN/ARLEN = 0)
  • AWSIZE/ARSIZE = 6 (64-byte transfers)
  • Sequential transaction ID generation for downstream tracking
  • mem_resp asserts when AXI transaction completes

axi_ram (Verification)

AXI4 slave wrapper around the simple memory model for simulation testing.

Purpose: Provides proper AXI4 handshaking to validate the simple_to_axi_master bridge logic. Ensures the DUT speaks real AXI4 end-to-end.

Signal Interfaces

Simple Memory Protocol

Used internally between modules. Not AXI-compliant but simple for RTL implementation.

Signal Width Direction Description
address 64 Output Byte-aligned address
read 1 Output Read request strobe
write 1 Output Write request strobe
wdata 512 Output Write data
rdata 512 Input Read data (valid with resp)
resp 1 Input Response strobe (single cycle)

AXI4 Interface

Standard AXI4 signals with the following configuration:

Property Value
Data Width 512 bits
Address Width 64 bits
ID Width 8 bits
Burst Type Single-beat (LEN = 0)
Burst Size 64 bytes (SIZE = 6)

Running Simulation

Prerequisites

  • ModelSim (tested with Intel FPGA Edition 2020.1)
  • ModelSim binary path configured in simulation scripts

Command-Line Simulation

cd /path/to/tlb
./sim_afu_cache_cli.sh

Compiles all modules and runs the testbench to completion, printing results to stdout.

GUI Simulation

cd /path/to/tlb
./sim_afu_cache.sh

Launches ModelSim with waveform viewer for interactive debugging.

Test Coverage

The testbench executes the following test sequences:

Test Description
single_read Basic cache miss and load
single_read_hit Cache hit verification
multiple_reads Sequential reads across cache lines
single_write Basic write operation
write_then_read Write followed by read verification
way_associativity Fill all 4 ways of a set
striped_writes Multiple writes followed by reads
plru_simple PLRU eviction policy verification
random_short 20 random read/write operations
flush_cache Force writebacks by reading new addresses
memory_consistency Compare DUT memory against golden model

All tests compare results against a golden memory model to verify correctness.

CXL Integration Notes

Target Environment

This system is designed for integration into the AFU (Accelerator Functional Unit) of an Intel CXL Type 3 design example. The AFU sits between:

  • Upstream: CXL IP (T3PH) providing AXI4 requests from the host
  • Downstream: Memory controller (mc_top) managing DDR4

Signal Naming Conventions

When integrating, rename signals to match Intel CXL IP conventions:

This Project Intel CXL Convention
s_axi_* ip2hdm_* (IP to HDM)
m_axi_* afu2mc_* (AFU to MC)

Synthesis Considerations

  1. Arbiter Replacement: Replace mem_arbiter_stub with an Intel AXI Interconnect generated from Platform Designer. Generate HDL with simulation files enabled to maintain consistent behavior between simulation and synthesis.

  2. Timing Closure: The current design prioritizes simplicity over performance. Add pipeline stages if timing closure requires it.

  3. Address Map: Adjust PASSTHROUGH_BASE to match the actual FPGA memory map. For a 16GB device, typical values might be:

    • Cached region: 0x0 to 0x1_FFFF_FFFF (8GB)
    • Passthrough region: 0x2_0000_0000 to 0x3_FFFF_FFFF (8GB)

Design Constraints

  • Cache expects 64-byte aligned addresses (lower 6 address bits are ignored)
  • Single outstanding transaction per path (no pipelining)
  • Write operations take priority over reads when contending for memory port
  • AXI transaction IDs are passed through, not reordered

File History

File Status Notes
axi_cache_adapter.sv Modified Updated to 512-bit, added AXI ID support
axi_cache_adapter_old.sv Archived Original 64-bit burst version
testbench.sv Modified Updated for 512-bit AXI4
testbench_old.sv Removed Original testbench

References

  • ARM AMBA AXI Protocol Specification (AXI4)
  • Intel CXL IP User Guide
  • Intel Agilex FPGA Design Guidelines

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published