Implementation of Figure 1 (parallel-prefix) and Figure 2 (subscalar) adder architectures from the paper.
Figure 1 - Parallel-Prefix Adder:
- 4-stage pipeline with 8-bit grouping
- High area cost, deeply pipelined
- Takes ~10 cycles for s = a + b + c
Figure 2 - Subscalar Adder:
- Fragment-based with 8-bit ripple blocks
- Registers between fragments enable overlap
- Takes ~5 cycles for s = a + b + c
ParallelPrefixAdder.scala- Figure 1 implementationSubscalarAdder.scala- Figure 2 implementationTestBench.scala- Top-level with both architecturestestbench.v- Verilog testbenchrun_sim.sh- Automated simulation script
# Run complete simulation
./run_sim.sh
# Manual steps:
sbt "runMain VerilogGen"
cd generated
iverilog -o sim ../testbench.v Top.v
./sim
gtkwave waveform.vcd- Latency: Subscalar completes dependent additions faster
- Area: Fewer pipeline registers in subscalar design
- Throughput: Fragment-level parallelism in subscalar approach