This repository contains the code for benchmarking FPGA resource usage for Verilog solutions generated by different LLMs.
The simulation tool for functional correctness tests and the synthesis tool for obtaining resource usage are both based on Vivado. Please install Vivado in advance to run the framework. If you wish to use other tools, modify the relevant Python scripts accordingly.
Some dependencies are listed in requirements.txt
, which you can install using:
pip install -r requirements.txt
The problems.json
file contains our benchmark dataset, formatted as follows:
{
"Combinational Logic": [
{
"module": "parity_8bit",
"Problem": "Implement a Verilog module that computes the parity of an 8-bit input vector. The output should be 1 if the number of '1's in the input is odd, and 0 otherwise.",
"Module header": "module parity_8bit (\n input [7:0] in,\n output out\n);",
"Testbench": "`timescale 1ns / 1ps\n\nmodule parity_8bit_tb; ..."
}
],
"Finite State Machines": []
}
You can use this dataset to generate solutions and run functional correctness checks for any LLMs you want to evaluate.
The solutions
directory contains our experimental results, formatted as follows:
{
"gpt-3.5-turbo": {
"Combinational Logic": [
{
"module": "parity_8bit",
"solutions": [
{
"solution": "module parity_8bit (input [7:0] in, output out); assign out = in[0] ^ in[1] ^ in[2] ^ in[3] ^ in[4] ^ in[5] ^ in[6] ^ in[7]; endmodule",
"pass": "true",
"resource usage": {
"optimized": {
"LUT": 2,
"FF": 0,
"DSP": 0,
"BRAM": 0,
"IO": 9
},
"primitives": {
"LUT": 2,
"FF": 0,
"DSP": 0,
"BRAM": 0,
"IO": 9
}
}
}
]
}
],
"Finite State Machines": [
{
"module": "fsm_3state",
"solutions": []
}
]
},
"gpt-4o":{}
}
To quickly run the benchmarking process, copy solutions.json
from the solutions
directory to the same directory as setup.py
, then execute:
python setup.py -model gpt-4o 5 your_openai_api_key -functional_correctness -resource_usage
This command will:
- Generate 5 solutions for each problem using
gpt-4o
. - Run the functional correctness check.
- Obtain the resource usage report for LUT usage.
The standard script currently supports OpenAI's GPT models. If you want to test other LLMs, please modify generate_solutions.py
accordingly.
You can also run the functional test and resource usage analysis on your own solutions. Ensure that your solutions.json
follows the format above and place it in the same directory as setup.py
, then execute:
python setup.py -functional_correctness -resource_usage
To run the functional correctness check alone:
python setup.py -functional_correctness
To run resource usage analysis alone:
python setup.py -resource_usage
If ResBench is useful for your research work, please cite our paper:
- Guo C, Zhao T. ResBench: A Resource-Aware Benchmark for LLM-Generated FPGA Designs. InProceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies 2025 May 26 (pp. 25-34).
BibTeX code:
@inproceedings{guo2025resbench,
title={ResBench: A Resource-Aware Benchmark for LLM-Generated FPGA Designs},
author={Guo, Ce and Zhao, Tong},
booktitle={Proceedings of the 15th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies},
pages={25--34},
year={2025}
}