This repo contains the framework introduced in the paper BufferProspector [Wei25]_ . The python tool allocates unused buffer for the ofmaps that needs to be buffered in Timing Mismatch of Layer-pipeline Mapping. See the details in our DAC'25 [Wei25]_ paper.
This framework is built upon the framework of Tangram [Gao19, Gao17]_ .
If you use this tool in your work, we kindly request that you reference our paper below, and send us a citation of your work.
- Wei et al., "BufferProspector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators", in DAC, June 2025.
The installation process can be done in the same way of the Tangram framework.
However, to make the installation easier, in the main python scripts we have
added the path of BufferProspector into PYTHONPATH manually, and one only need
to take care of the dependencies during installation.
One can install the dependencies by using pip::
> pip install -r requirements.txt
To reproduce the experiments in BufferProspector[Wei25]_ , simply run
> cd nn_dataflow/tools
> python exp_overall.py
> python exp_dse.py
The results will be output to the folders 01_overall and 02_DSE under nn_dataflow/tools.
Each result contains 3 files:
-
The
*.jsonfile contains the searched scheme and its performance, latency, buffer usage, etc. -
The
*.datfile contains the dumpedNNDataflowSchemeobject and segment information, for direct reading the object back to python during debugging and inspections. -
The
*.txtfile contains the log of the experiments, including the DP searched segments and their costs.
The *.dat file will be produced only when the experiment finishes, and its existance is used
as a flag in the exp_*.py scripts: If the dat file is produced, related experiment is skipped
to avoid running it the second time.
After the script finished, run data.py under the nn_dataflow/tools folder to gather the related statistics:
> # cd nn_dataflow/tools
> python ./data.py
The final result is generated into the res.csv file.
Notes:
-
The
exp_*.pyscripts can run the experiments in parallel. One can change therun_sequentialfunction intorun_threadsto enable parallel execution. However, the memory consumption of the program is large, and if there is not enough memory (at least 64GB for LLMs), running the experiments sequentially is recommended. The hyperparameters to control the parallel execution is atrun_multithread.py, please refer to the script for further information. -
We have also provided the expected results of the experiments at the
nn_dataflow/DAC_expfolder. One can directly move it into thenn_dataflow/toolsfolder and rundata.pyto obtain the results.
BufferProspector is free software; you can redistribute it and/or modify it
under the terms of the BSD License <LICENSE>__ as published by the Open
Source Initiative, revised version.
.. [Wei25] Wei, Cai, Gao, Peng, Wu, Shi, and Ma, Buffer Prospector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators__, in DAC. June, 2025.
.. [Gao19] Gao, Yang, Pu, Horowitz, and Kozyrakis, TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators <//dl.acm.org/citation.cfm?id=3297858.3304014>__, in ASPLOS. April, 2019.
.. [Gao17] Gao, Pu, Yang, Horowitz, and Kozyrakis, TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory <//dl.acm.org/citation.cfm?id=3037697.3037702>__, in ASPLOS. April, 2017.