Vitis accelerator #991

axiotisk · 2024-04-05T13:29:45Z

Description

The Vitis Accelerator Backend builds upon the foundation laid by the Vitis backend and streamlines the generation process for PCIe accelerators using the Vitis Accelerator Flow.
Features:

This backend inherits from the Vitis backend, ensuring compatibility with existing workflows and projects.
Converts the input of the top-level design from AXI Stream to memory-mapped and the output from memory-mapped to AXI Stream.
Automates the generation of host code and the necessary makefile for kernel compilation.
Please note that the software and hardware emulation features are still a work in progress and will be added in subsequent commits.

Type of change

For a new feature or function, please create an issue first to discuss it
with us before submitting a pull request.

Note: Please delete options that are not relevant.

Bug fix (non-breaking change that fixes an issue)
Documentation update
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
A new research paper code implementation
Other (Specify)

Tests

The backend has been tested with the hls4ml getting started tutorial example.

Test Configuration:
The Vitis version used for the validation is 2022.2.
The functionality of the project was tested on a VCK5000 accelerator board.

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

qberthet · 2024-07-22T08:16:40Z

hls4ml/model/graph.py

@@ -865,6 +865,14 @@ class TraceData(ctypes.Structure):
        else:
            return output, trace_output

+    def hardware_predict(self, x, **kwargs):


This method has been added to enable performing predictions directly on the FPGA from the Python code. It feels a bit intrusive to add this backend-specific code to the hls4ml core. Another approach could be to modify predict() to allow backend-specific overloading. So, model.hardware_predict(x) could become model.predict(x, target='hw'), but this also requires some modification of the existing core code. Could an hls4ml dev provide advice on the best approach here? (@vloncar, @jmitrevs?). Thanks!

I would be in favor of naming this somewhat different (predict_hw for example) and moving the exception to backends (in FPGABackend should be enough to cover those that don't/cannot support it).

Some longer term idea here would be that we would have 3 ways of doing preciction: predict_emu (emulation, the current one), predict_sim (simulation via pyverilator) and predict_hw (real deal), with the predict being predict_emu by default with maybe a switch for user to control which one is called if it's just predict(x).

Squashed commit of all the work done by Konstantinos Axiotis prior to merge with Alex Yang work.

- use backend specific target parameter - remove environnement checks that are also performed in the makefile

qberthet · 2025-01-12T22:06:23Z

As testing of this PR have been mentioned in the minutes of the last dev meeting, the most recent work on the host code provided with the VitisAccelerator have been pushed to ensure testing of the latest version (Also rebased on current main).

- Multiple devices support - Selection of device by BDF - OpenCL error checking - Automatic memory bank association - Inferences validation - Improved command line parameters - Improved debug output - Dummy buffer copy to avoid benchmarking buffer allocation time - Removal of mutexes preventing buffer copies overlap with kernel executions on the same CU with multiple workers - Documentation

bo3z · 2025-04-08T17:02:34Z

I've done a first pass, going through the hls4ml-core changes. Most of the comments are minor, just to make sure the code is consistent with the rest of the hls4ml codebase. In the following days, I'll also try out the VitisAccelerator on a local set-up with a U55C / U250 and try to review the accelerator-specific (templates, build files etc.) changes.

Overall, a very nice addition to the hls4ml codebase and seems very orthogonal to all the other functionality, so shouldn't be many issues with merging it soon.

qberthet · 2025-04-08T19:08:08Z

Thanks for the review. There is probably some room for improvement, so please comment on your testing experience. We intend to do a polishing pass, mostly to provide a more seamless integration from the Python code, but maybe this can be done in a subsequent PR if the current PR is deemed usable enough.

bo3z · 2025-04-10T08:34:26Z

I just tried testing the VitisAccelerator backend on Alveo u55c and Alveo u250, but there were some issues:

The biggest issue are timing violations: On both the u55c and u250, there is very large WNS; around -3ns to -5ns. I tried synthesising with clock periods of 4ns and 5ns; both with 27% uncertainty. Also, tried lowering the batch size to 1 (hoping it simplifies the logic and reduces congestion). Finally, I tried both with and without hw_quant. Overall, all of these cases and across boards had significant timing violations; which is a bit unexpected. To me this seems like some missing constraint in the build process or similar. I've commonly seen timing violations on the u55c around the HBM, but they are usually much smaller (-0.5ns) and can be fixed by some floor-planning and passing more advanced Vivado directives.
I had to change the platform for u250 to: xilinx_u250_gen3x16_xdma_4_1_202210_1, because I got the error from Vitis "Platform not found". However, a quick google of the one in this PR: xilinx_u250_xdma_201830_2, does find it. I am wondering whether there are several versions of the u250?
On the u250; when I changed the platform, the placer in implementation failed. There was a constrain (I guess generated by hls4ml), that forces the model kernel onto SLR0; however the specific model couldn't fit into SLR0. The model was the jet tagging model, so not too large; but I think we should avoid such explicit placements of kernels to SLRs, as it can be quite hard to estimate the resource usage of a model before actual synthesis. Per-SLR placement should probably be left to more advanced users who have issues meeting timing, in my opinion.

bo3z · 2025-04-10T11:40:40Z

So in response to the above comment: the significant timing issues are only for io_parallel; io_stream has no such issues.

qberthet · 2025-04-10T15:57:22Z

Thanks again for taking the time to test this!

Yes, timing closure is very design-dependent and is generally expected to be handled by the model creator. That said, you raise a good point: we mostly tested with io_stream, so we didn’t encounter this kind of issue. Since io_stream is typically the preferred option for large models in acceleration contexts, this choice made sense for our use case. However, it does make quick evaluations using io_parallel less effective (and this might be a use case for this backend). Perhaps an io_parallel-optimized version in the HLS wrapper could help address this.

Regarding the platform: yes, there are multiple platform versions (think fpga shell versions) for each board. Rather than trying to cover all cases, our goal is to offer an easy way for users to switch between them, while providing sensible defaults, though these may change over time. We should make this clearer in the documentation, or at least refer to AMD documentation about XRT platforms.

You’re also right about constraints, we shouldn’t provide any by default. It’s better to let users add them as needed for their specific designs. In the same spirit, we’ve removed explicit buffer object memory associations as well.

We’ll be fixing the constraint handling and updating the platform documentation and defaults soon. Updating the wrapper to better support io_parallel might take a bit longer, so that could come in a future PR.

jmitrevs added this to the v1.1.0 milestone Apr 5, 2024

axiotisk force-pushed the vitis_accelerator_dev branch 2 times, most recently from e92a6be to 86f75b5 Compare June 14, 2024 09:15

qberthet force-pushed the vitis_accelerator_dev branch from c875785 to 64c8baa Compare July 2, 2024 13:13

axiotisk marked this pull request as ready for review July 10, 2024 17:14

qberthet reviewed Jul 22, 2024

View reviewed changes

bo3z mentioned this pull request Nov 7, 2024

Hello, after converting GRU network to IP core using hls4ml, the output of IP core is not always 0. Here is my conversion code. #1111

Open

steltze mentioned this pull request Nov 20, 2024

Vitis Accelerator IP Flow #1134

Open

10 tasks

axiotisk and others added 22 commits January 12, 2025 22:49

Initial version of VitisAccelerator backend:

143350b

Squashed commit of all the work done by Konstantinos Axiotis prior to merge with Alex Yang work.

fixing discrepancies post-merge

a616c28

reverting unnecessary changes

31f5851

final adjustments

c16888c

minor fixes and testing notebook

e1af21a

minor fixes

432b1f5

Updated host code and added more board support

43c5a93

cleaned up c++ code generation and added build functionality

bbcffe3

Added ability to use numpy array as I/O + CNN fixes

30d6b51

Optimizations for reading dat + copytree bugfix

793060c

updated testing notebook

d07d1ab

Cleaned-up host code + improved .dat generation

00ae141

fixed testing notebook

5a1ae52

build() signature alignment + xcl update + write_host() overwrite

f29b5a4

Fix VCK5000 part definition

1289a4d

Documentation draft

22076b7

Default directives + HLS Clock control

f2b59fa

implementing hw quant option

45dfd8b

Update makefile

0ed662e

Fix vck5000 detection in makefile

05c83c8

Remove messageDb from config file now that it is handled in makefile

f82f683

build dir name + versal packaging + ultraclean

172f6f1

Quentin Berthet and others added 19 commits January 12, 2025 22:49

Python black formating

25a0a7c

Apply pre-commit suggested changes (formating)

2a89e4e

Update manifest and remove developpement requirement.txt

4eed1ae

Update documentation.

1c5a4e5

Documentation update

5780d2d

fixing build() behavior + documentation

6e129da

Whitespace cleanup

d560c75

Fix missing parameter in create_initial_config() (due to rebase)

3074d8c

Remove duplication in documentation

86ff4d3

Fix pre-commit

62bb04b

fixing spacing in generated code

a752f77

Fix typo

5376926

Update bulild():

5be9569

- use backend specific target parameter - remove environnement checks that are also performed in the makefile

Add a target parameter to hardware_predict()

92c0692

Update documentation.

f69a950

Setup emu in Makefile and edit tb_input_features in host

236e7db

Backend and Makefile fixes for emulation

51e9fd5

Update host code for clarity & better data handling

1a3908d

Allowing flexibility with platforms

a8e8466

qberthet force-pushed the vitis_accelerator_dev branch from abd46ca to 18f7fc7 Compare January 12, 2025 22:01

qberthet force-pushed the vitis_accelerator_dev branch from 18f7fc7 to 22401ba Compare February 8, 2025 08:52

bo3z modified the milestones: v1.1.0, v1.2.0 Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vitis accelerator #991

Vitis accelerator #991

axiotisk commented Apr 5, 2024 •

edited

Loading

qberthet Jul 22, 2024

vloncar Jul 24, 2024

qberthet commented Jan 12, 2025

bo3z commented Apr 8, 2025

qberthet commented Apr 8, 2025

bo3z commented Apr 10, 2025

bo3z commented Apr 10, 2025

qberthet commented Apr 10, 2025

Vitis accelerator #991

Are you sure you want to change the base?

Vitis accelerator #991

Conversation

axiotisk commented Apr 5, 2024 • edited Loading

Description

Type of change

Tests

Checklist

qberthet Jul 22, 2024

Choose a reason for hiding this comment

vloncar Jul 24, 2024

Choose a reason for hiding this comment

qberthet commented Jan 12, 2025

bo3z commented Apr 8, 2025

qberthet commented Apr 8, 2025

bo3z commented Apr 10, 2025

bo3z commented Apr 10, 2025

qberthet commented Apr 10, 2025

axiotisk commented Apr 5, 2024 •

edited

Loading