A 32-bit pipelined RISC-V processor implemented in Verilog, designed and verified as part of a computer architecture course at UW-Madison. Scored 220/200 — full marks plus 20 extra credit points for successful FPGA deployment.
The processor implements the RV32I base integer instruction set using a classic 5-stage pipeline:
- IF — Instruction Fetch
- ID — Instruction Decode & Register Read
- EX — Execute (ALU operations, branch resolution)
- MEM — Data Memory Access
- WB — Write Back
- Full RV32I instruction set support including R, I, S, B, U, and J type instructions
- Pipelined execution with pipeline registers between every stage
- Hazard detection and data forwarding — handles RAW hazards via EX-EX and MEM-EX forwarding paths, eliminating unnecessary stalls for most instructions
- Load-use stall detection — automatically stalls the pipeline when a load result is needed immediately by the following instruction
- Branch resolution and pipeline flush — branch and jump targets resolved in the EX stage with delayed flush logic to squash incorrectly fetched instructions
- Byte and halfword memory access support with data memory mask generation
- Misaligned memory access detection with trap signaling
- Reset delay chain to prevent false trap signals at startup
- Complete instruction retire interface for testbench verification
- Successfully synthesized and deployed on a Xilinx FPGA
| Module | Description |
|---|---|
hart.v |
Top-level processor module, pipeline instantiation |
alu.v |
Arithmetic Logic Unit — add/sub, shifts, logic, comparisons |
hazardHalt.v |
Hazard detection, data forwarding, stall and flush control |
branchControl.v |
Branch condition evaluation and jump detection |
generalControl.v |
Main control signal decoder |
imm.v |
Immediate value generator for all instruction formats |
rf.v |
32-entry register file with synchronous write, async read |
DMMask.v |
Data memory write mask and data alignment |
DMresult.v |
Data memory read result sign/zero extension |
pc.v |
Program counter register |
opcodeDecoder.v |
1-hot instruction format encoder |
forwarding.v |
Register forwarding logic for resolving RAW data hazards |
add.v |
32-bit adder used for PC increment and branch target calculation |
mux2.v, mux4.v |
2:1 and 4:1 multiplexers |
dff_*.v |
Parameterized D flip-flop pipeline register primitives |
memory.v |
Parameterized memory model with configurable latency and initiation interval (used by testbenches) |
Two testbenches are included in testbenches/:
all_tests.v— runs all 22 RV32I instruction tests sequentially in a single simulation, reporting CPI and instruction count for each. Includes a 40,000 cycle timeout to catch infinite loops.hart_tb.v— single program testbench using a parameterized memory model (LATENCY=4, INTERVAL=2) to simulate realistic variable-latency memory, testing hazard and stall logic under non-ideal memory conditions.
Test programs in tests/ include individual instruction tests (01add.asm, 06memory.asm) as well as real programs (factorial_2.asm, factorial_5.asm, sort.asm) that exercise branching, memory access, and multi-cycle execution.
- Simulation: Questasim / Modelsim
- Synthesis & FPGA: Vivado, Quartus
- Language: Verilog
This was the final phase of a multi-phase processor project, building up from a single-cycle implementation to a fully pipelined design. Verification was performed through an automated Gradescope testbench suite covering all RV32I instruction types, hazard scenarios, and edge cases — scoring 220/200 including 20 extra credit points for successful FPGA deployment.
A write-through, write-allocate 2-way set-associative cache is included in the cache/ directory. It implements a 1KB cache with 32 sets, 2 ways, and 16-byte cache lines, using NMRU (not most recently used) replacement policy. On a miss, the cache fetches the full 16-byte cache line from memory across 4 sequential word reads before servicing the CPU request. Byte and half-word masked writes are supported. The cache was developed separately and was not fully integrated into the final pipeline submission.
