-
Notifications
You must be signed in to change notification settings - Fork 449
Vitis accelerator #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Vitis accelerator #991
Conversation
e92a6be
to
86f75b5
Compare
c875785
to
64c8baa
Compare
@@ -865,6 +865,14 @@ class TraceData(ctypes.Structure): | |||
else: | |||
return output, trace_output | |||
|
|||
def hardware_predict(self, x, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has been added to enable performing predictions directly on the FPGA from the Python code. It feels a bit intrusive to add this backend-specific code to the hls4ml core. Another approach could be to modify predict()
to allow backend-specific overloading. So, model.hardware_predict(x)
could become model.predict(x, target='hw')
, but this also requires some modification of the existing core code. Could an hls4ml dev provide advice on the best approach here? (@vloncar, @jmitrevs?). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of naming this somewhat different (predict_hw
for example) and moving the exception to backends (in FPGABackend
should be enough to cover those that don't/cannot support it).
Some longer term idea here would be that we would have 3 ways of doing preciction: predict_emu
(emulation, the current one), predict_sim
(simulation via pyverilator) and predict_hw
(real deal), with the predict
being predict_emu
by default with maybe a switch for user to control which one is called if it's just predict(x)
.
Squashed commit of all the work done by Konstantinos Axiotis prior to merge with Alex Yang work.
- use backend specific target parameter - remove environnement checks that are also performed in the makefile
abd46ca
to
18f7fc7
Compare
As testing of this PR have been mentioned in the minutes of the last dev meeting, the most recent work on the host code provided with the VitisAccelerator have been pushed to ensure testing of the latest version (Also rebased on current main). |
- Multiple devices support - Selection of device by BDF - OpenCL error checking - Automatic memory bank association - Inferences validation - Improved command line parameters - Improved debug output - Dummy buffer copy to avoid benchmarking buffer allocation time - Removal of mutexes preventing buffer copies overlap with kernel executions on the same CU with multiple workers - Documentation
18f7fc7
to
22401ba
Compare
I've done a first pass, going through the hls4ml-core changes. Most of the comments are minor, just to make sure the code is consistent with the rest of the hls4ml codebase. In the following days, I'll also try out the VitisAccelerator on a local set-up with a U55C / U250 and try to review the accelerator-specific (templates, build files etc.) changes. Overall, a very nice addition to the hls4ml codebase and seems very orthogonal to all the other functionality, so shouldn't be many issues with merging it soon. |
Thanks for the review. There is probably some room for improvement, so please comment on your testing experience. We intend to do a polishing pass, mostly to provide a more seamless integration from the Python code, but maybe this can be done in a subsequent PR if the current PR is deemed usable enough. |
I just tried testing the VitisAccelerator backend on Alveo u55c and Alveo u250, but there were some issues:
|
So in response to the above comment: the significant timing issues are only for io_parallel; io_stream has no such issues. |
Thanks again for taking the time to test this! Yes, timing closure is very design-dependent and is generally expected to be handled by the model creator. That said, you raise a good point: we mostly tested with io_stream, so we didn’t encounter this kind of issue. Since io_stream is typically the preferred option for large models in acceleration contexts, this choice made sense for our use case. However, it does make quick evaluations using io_parallel less effective (and this might be a use case for this backend). Perhaps an io_parallel-optimized version in the HLS wrapper could help address this. Regarding the platform: yes, there are multiple platform versions (think fpga shell versions) for each board. Rather than trying to cover all cases, our goal is to offer an easy way for users to switch between them, while providing sensible defaults, though these may change over time. We should make this clearer in the documentation, or at least refer to AMD documentation about XRT platforms. You’re also right about constraints, we shouldn’t provide any by default. It’s better to let users add them as needed for their specific designs. In the same spirit, we’ve removed explicit buffer object memory associations as well. We’ll be fixing the constraint handling and updating the platform documentation and defaults soon. Updating the wrapper to better support io_parallel might take a bit longer, so that could come in a future PR. |
Description
The Vitis Accelerator Backend builds upon the foundation laid by the Vitis backend and streamlines the generation process for PCIe accelerators using the Vitis Accelerator Flow.
Features:
Type of change
For a new feature or function, please create an issue first to discuss it
with us before submitting a pull request.
Note: Please delete options that are not relevant.
Tests
The backend has been tested with the hls4ml getting started tutorial example.
Test Configuration:
The Vitis version used for the validation is 2022.2.
The functionality of the project was tested on a VCK5000 accelerator board.
Checklist
pre-commit
on the files I edited or added.