Skip to content

SamKaufman24/Pipelined-RISCV-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipelined RISC-V Processor

A 32-bit pipelined RISC-V processor implemented in Verilog, designed and verified as part of a computer architecture course at UW-Madison. Scored 220/200 — full marks plus 20 extra credit points for successful FPGA deployment.

Architecture

Pipeline Architecture Diagram

The processor implements the RV32I base integer instruction set using a classic 5-stage pipeline:

  1. IF — Instruction Fetch
  2. ID — Instruction Decode & Register Read
  3. EX — Execute (ALU operations, branch resolution)
  4. MEM — Data Memory Access
  5. WB — Write Back

Features

  • Full RV32I instruction set support including R, I, S, B, U, and J type instructions
  • Pipelined execution with pipeline registers between every stage
  • Hazard detection and data forwarding — handles RAW hazards via EX-EX and MEM-EX forwarding paths, eliminating unnecessary stalls for most instructions
  • Load-use stall detection — automatically stalls the pipeline when a load result is needed immediately by the following instruction
  • Branch resolution and pipeline flush — branch and jump targets resolved in the EX stage with delayed flush logic to squash incorrectly fetched instructions
  • Byte and halfword memory access support with data memory mask generation
  • Misaligned memory access detection with trap signaling
  • Reset delay chain to prevent false trap signals at startup
  • Complete instruction retire interface for testbench verification
  • Successfully synthesized and deployed on a Xilinx FPGA

Module Structure

Module Description
hart.v Top-level processor module, pipeline instantiation
alu.v Arithmetic Logic Unit — add/sub, shifts, logic, comparisons
hazardHalt.v Hazard detection, data forwarding, stall and flush control
branchControl.v Branch condition evaluation and jump detection
generalControl.v Main control signal decoder
imm.v Immediate value generator for all instruction formats
rf.v 32-entry register file with synchronous write, async read
DMMask.v Data memory write mask and data alignment
DMresult.v Data memory read result sign/zero extension
pc.v Program counter register
opcodeDecoder.v 1-hot instruction format encoder
forwarding.v Register forwarding logic for resolving RAW data hazards
add.v 32-bit adder used for PC increment and branch target calculation
mux2.v, mux4.v 2:1 and 4:1 multiplexers
dff_*.v Parameterized D flip-flop pipeline register primitives
memory.v Parameterized memory model with configurable latency and initiation interval (used by testbenches)

Verification

Two testbenches are included in testbenches/:

  • all_tests.v — runs all 22 RV32I instruction tests sequentially in a single simulation, reporting CPI and instruction count for each. Includes a 40,000 cycle timeout to catch infinite loops.
  • hart_tb.v — single program testbench using a parameterized memory model (LATENCY=4, INTERVAL=2) to simulate realistic variable-latency memory, testing hazard and stall logic under non-ideal memory conditions.

Test programs in tests/ include individual instruction tests (01add.asm, 06memory.asm) as well as real programs (factorial_2.asm, factorial_5.asm, sort.asm) that exercise branching, memory access, and multi-cycle execution.

Tools

  • Simulation: Questasim / Modelsim
  • Synthesis & FPGA: Vivado, Quartus
  • Language: Verilog

Notes

This was the final phase of a multi-phase processor project, building up from a single-cycle implementation to a fully pipelined design. Verification was performed through an automated Gradescope testbench suite covering all RV32I instruction types, hazard scenarios, and edge cases — scoring 220/200 including 20 extra credit points for successful FPGA deployment.

A write-through, write-allocate 2-way set-associative cache is included in the cache/ directory. It implements a 1KB cache with 32 sets, 2 ways, and 16-byte cache lines, using NMRU (not most recently used) replacement policy. On a miss, the cache fetches the full 16-byte cache line from memory across 4 sequential word reads before servicing the CPU request. Byte and half-word masked writes are supported. The cache was developed separately and was not fully integrated into the final pipeline submission.

About

32-bit pipelined RISC-V processor implemented in Verilog with hazard detection, data forwarding, and FPGA deployment

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors