Skip to content

Latest commit

 

History

History
302 lines (229 loc) · 33.4 KB

File metadata and controls

302 lines (229 loc) · 33.4 KB

oneAPI Samples for Field Programmable Gate Arrays (FPGAs)

The folders in this area of the oneAPI-sample GitHub repository include tutorials, reference designs, and libraries specific to field programmable gate array (FPGA) features.

You will need the following toolkits and add-ons:

Note: The latest versions of code samples on the master branch are not guaranteed to be stable. Use a stable release version of the repository that corresponds to the version of the compiler you are using.

Understand FPGA Programming

The Introduction To FPGA Design Concepts section of the FPGA Optimization Guide for Intel® oneAPI Toolkits contains information on the basic concepts that are foundational to FPGA programming. Read that section to get the most from these FPGA samples.

FPGA Repository Structure

This area of the oneAPI-sample repository has a general structure intended to help you find the resources.

  • Tutorials
    • GettingStarted: Contains basic samples to get you through your first compiles.
    • Features: Contains samples that demonstrate useful compiler features, like loop unrolling.
    • DesignPatterns: Contains samples that show coding patterns to generate efficient hardware usage.
    • Tools: Contains sample to demonstrate how to use external debugging tools, like profiling.
  • ReferenceDesigns: Contains samples that showcase high-performance designs that take advantage of multiple features and design patterns shown in the Tutorials section.
  • include: Contains commonly used functions wrapped as libraries.

Sample Categories

To help you understand and use the code samples in a coherent manner, the samples are categorized by the tiers.

  • Tier 1: Get Started
  • Tier 2: Explore the Fundamentals
  • Tier 3: Explore the Advanced Techniques
  • Tier 4: Explore the Reference Designs

Tier 1: Get Started

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")
   
   tier1 --> tier2 --> tier3 --> tier4
   
   style tier1 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
   style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
Loading
Sample Category Description
fpga_compile Tutorials/GettingStarted How and why compiling SYCL* code for FPGA differs from CPU or GPU
FPGA device image types and when to use them.
The compile options used to target FPGA
fast_recompile Tutorials/GettingStarted Why to separate host and device code compilation in your FPGA project
How to use the -reuse-exe and device link.
Which method to choose for your project
fpga_template Tutorials/GettingStarted An Intel® FPGA tutorial that explains the CMake build system that is used in other code samples, and serves as a template that you can re-use in your own designs
component_interfaces_comparison Tutorials/Features/hls_flow_interfaces This sample introduces different invocation/data interfaces that can be used for IP components

Tier 2: Explore the Fundamentals

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")
   
   tier1 --> tier2 --> tier3 --> tier4
   
   style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier2 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
   style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
Loading
Sample Category Description
ac_fixed Tutorials/Features How different methods of ac_fixed number construction affect hardware resource utilization
Recommended method for constructing ac_fixed numbers in your kernel
Accessing and using the ac_fixed math library functions
Trading off accuracy of results for reduced resource usage on the FPGA
ac_int Tutorials/Features Using the ac_int data type for basic operations
Efficiently using the left shift operation
Setting and reading certain bits of an ac_int number
device_global (experimental) Tutorials/Features The basic usage of the device_global class
How to initialize a device_global to non-zero values
double_buffering Tutorials/DesignPatterns How and when to implement the double buffering optimization technique
explicit_data_movement Tutorials/DesignPatterns How to explicitly manage the movement of data for the FPGA
hardware_reuse Tutorials/Features How to reuse hardware in your FPGA designs by using loops and task sequences
hostpipes (experimental) Tutorials/Features How to use host pipes to send and receive data between a host and the FPGA
invocation_interfaces Tutorials/Features/hls_flow_interfaces How to specify the kernel invocation interface and kernel argument interfaces
kernel_args_restrict Tutorials/Features The problem of pointer aliasing and its impact on compiler optimizations.
The behavior of the kernel_args_restrict attribute and when to use it on your kernel
The effect this attribute can have on kernel performance on FPGA
loop_coalesce Tutorials/Features What the loop_coalesce attribute does
How loop_coalesce attribute affects resource usage and loop throughput
How to apply the loop_coalesce attribute to loops in your program
Which loops make good candidates for coalescing
loop_fusion Tutorials/Features Basics of loop fusion
The reasons for loop fusion
How to use loop fusion to increase performance
Understanding safe application of loop fusion
loop_initiation_interval Tutorials/Features The fMAX-II tradeoff
Default behavior of the compiler when scheduling loops
How to use intel::initiation_interval to attempt to set the II for a loop
Scenarios in which intel::initiation_interval can be helpful in optimizing kernel performance
loop_ivdep Tutorials/Features Basics of loop-carried dependencies
The notion of a loop-carried dependence distance
What constitutes a safe dependence distance
How to aid the compiler's dependence analysis to maximize performance
loop_unroll Tutorials/Features Basics of loop unrolling.
How to unroll loops in your program
Determining the optimal unroll factor for your program
max_interleaving Tutorials/Features The basic usage of the max_interleaving attribute
How the max_interleaving attribute affects loop resource use
How to apply the max_interleaving attribute to loops in your program
memory_attributes Tutorials/Features The basic concepts of on-chip memory attributes
How to apply memory attributes in your program
How to confirm that the memory attributes were respected by the compiler
A case study of the type of performance/area trade-offs enabled by memory attributes
mmhost Tutorials/Features/hls_flow_interfaces Basics of declaring Avalon memory-mapped host data interfaces for IP components
parallel_loops Tutorials/Features How to use task sequences to describe multiple parallel loops in a single kernel
pipes Tutorials/Features The basics of using SYCL*-compliant pipes extension for FPGA
How to declare and use pipes
printf Tutorials/DesignPatterns How to declare and use printf in program
streaming_data_interfaces Tutorials/Features/hls_flow_interfaces How to use pipes to implement streaming data interfaces on an IP component

Tier 3: Explore the Advanced Techniques

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")
   
   tier1 --> tier2 --> tier3 --> tier4
   
   style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier3 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
   style tier4 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
Loading
Sample Category Description
annotated_class_clean_coding (experimental) Tutorials/Features How to use annotated_class_util.hpp to simplify your oneAPI code that annotates properties to pipes and annotated_arg
annotated_ptr (experimental) Tutorials/Features How to use annotated_ptr to constrain a specific memory access
autorun Tutorials/DesignPatterns How and when to use autorun kernels
buffered_host_streaming Tutorials/DesignPatterns How to optimally stream data between the host and device to maximize throughput
compute_units Tutorials/DesignPatterns A design pattern to generate multiple compute units using template metaprogramming
dsp_control Tutorials/Features How to apply global DSP control in command-line interface
How to apply local DSP control in source code
Scope of datatypes and math operations that support DSP control
dynamic_profiler Tutorials/Tools About the Intel® FPGA dynamic profiler for DPC++
How to set up and use this tool
A case study of using this tool to identify performance bottlenecks in pipes
fpga_reg Tutorials/Features How to use the ext::intel::fpga_reg extension
How ext::intel::fpga_reg can be used to re-structure the compiler-generated hardware
Situations in which applying ext::intel::fpga_reg might be beneficial
io_streaming Tutorials/DesignPatterns How to stream data through the FPGA's IO using IO pipes
latency_control (experimental) Tutorials/Features How to set latency constraints to pipes and LSUs accesses
How to confirm that the compiler respected the latency control directive
loop_carried_dependency Tutorials/DesignPatterns A technique to remove loop carried dependencies from your FPGA device code, and when to apply it
lsu_control Tutorials/Features The basic concepts of LSU styles and LSU modifiers
How to use the LSU controls extension to request specific configurations
How to confirm what LSU configurations are implemented
A case study of the type of area trade-offs enabled by LSU
max_reinvocation_delay Tutorials/Features How and when to apply the max_reinvocation_delay attribute when optimizing loop throughput
mem_channel Tutorials/Features How and when to use the mem_channel buffer property and the -Xsno-interleaving flag
n_way_buffering Tutorials/DesignPatterns How and when to apply the N-way buffering optimization technique
onchip_memory_cache Tutorials/DesignPatterns How and when to implement the on-chip memory cache optimization
optimization_targets Tutorials/Features How to set optimization targets for your compile
How to use the minimum latency optimization target to compile low-latency designs
How to manually override underlying controls set by the minimum latency optimization target
optimize_inner_loop Tutorials/DesignPatterns How to optimize the throughput of an inner loop with a low trip
pipe_array Tutorials/DesignPatterns A design pattern to generate an array of pipes using SYCL*
Static loop unrolling through template metaprogramming
platform_designer Tutorials/Tools How to use FPGA IP produced with the Intel® oneAPI DPC++/C++ Compiler with Intel® Quartus® Prime Pro Edition software suite and Platform Designer
private_copies Tutorials/Features The basic usage of the private_copies attribute
How the private_copies attribute affects the throughput and resource use of your FPGA program
How to apply the private_copies attribute to variables or arrays in your program
How to identify the correct private_copies factor for your program
read_only_cache Tutorials/Features How and when to use the read-only cache feature
scheduler_target_fmax Tutorials/Features The behavior of the scheduler_target_fmax_mhz attribute and when to use it
The effect this attribute can have on kernel performance on FPGA
shannonization Tutorials/DesignPatterns How to make FPGA-specific optimizations to remove computation from the critical path and improve fMAX/II
simple_host_streaming Tutorials/DesignPatterns How to achieve low-latency host-device streaming while maintaining throughput
speculated_iterations Tutorials/Features What the speculated_iterations attribute does
How to apply the speculated_iterations attribute to loops in your program
How to determine the optimal number of speculated iterations
stall_enable Tutorials/Features What the use_stall_enable_clusters attribute does
How use_stall_enable_clusters attribute affects resource usage and latency
How to apply the use_stall_enable_clusters attribute to kernels in your program
system_profiling Tutorials/Tools Summary of profiling tools available for performance optimization
About the Intercept Layer for OpenCL™ Applications
How to set up and use this tool
A case study of using this tool to identify when the double buffering system-level optimization is beneficial
triangular_loop Tutorials/DesignPatterns How and when to apply the triangular loop optimization technique
use_library Tutorials/Tools How to integrate Verilog RTL into your oneAPI design directly
zero_copy_data_transfer Tutorials/DesignPatterns How to use SYCL USM host allocations for the FPGA

Tier 4: Explore the Reference Designs

flowchart LR
   tier1("Tier 1: Get Started")
   tier2("Tier 2: Explore the Fundamentals")
   tier3("Tier 3: Explore the Advanced Techniques")
   tier4("Tier 4: Explore the Reference Designs")
   
   tier1 --> tier2 --> tier3 --> tier4
   
   style tier1 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier2 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier3 fill:#0071c1,stroke:#0071c1,stroke-width:1px,color:#fff
   style tier4 fill:#f96,stroke:#333,stroke-width:1px,color:#fff
Loading

All the Tier 4 samples are in the ReferenceDesigns category.

Sample Description
anr How to create a parameterizable image processing pipeline to implement an Adaptive Noise Reduction (ANR) algorithm on a FPGA
board_test How to test board interfaces to ensure the designed platform provides expected performance
cholesky How to implement high performance matrix Cholesky decomposition on a FPGA
cholesky_inversion How to implement high performance Cholesky matrix decomposition on a FPGA
convolution_2d How to implement a 2D convolution IP component that can be exported to Intel® Quartus® Prime
crr How to implement the Cox-Ross-Rubinstein (CRR) binomial tree model on a FPGA
db How to accelerate database queries using an FPGA
decompress How to implement an efficient GZIP and Snappy decompression engine on a FPGA
gzip How to implement a high-performance multi-engine compression algorithm on FPGA
matmul How to implement a systolic-array-based high-performance matrix multiplication algorithm on FPGA
merge_sort How to use the spatial compute of the FPGA to create a merge sort design that takes advantage of thread- and SIMD-level parallelism
mvdr_beamforming How to create a full, complex system that performs IO streaming using SYCL*-compliant code
niosv Simulate a system with an FPGA IP produced with the Intel® oneAPI DPC++/C++ Compiler, and a Nios® V softcore processor.
pca How to implement high performance principal component analysis on a FPGA
qrd Implementing a high performance FPGA version of the Gram-Schmidt QR decomposition algorithm
qri Implementing a high performance FPGA version of the Gram-Schmidt QR decomposition to compute a matrix inversion

Start exploring the FPGA code samples with this selection

The following FPGA samples represent a selection of useful tutorials suitable to get you started on your first oneAPI application on the FPGA

Subject Sample
FPGA Compile Flow fpga_compile
Save Development Time fast_recompile
Avoid Aliasing of Kernel Arguments kernel_args_restrict
Optimize by Improving Loop Throughput loop_unroll
Transfer Data with Pipes pipes
Improve Performance with Double Buffering double_buffering

Build and Run the Samples on Local Development System

Each sample contains a README.md file with instructions to build and run the sample. The following sections contain information about configuring your development environment to build and run the samples; in most cases, the sample README.md file contains specific instructions.

Set Environment Variables

When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the setvars script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.

Note: For more information on configuring environment variables, see Use the setvars Script with Linux* or macOS* or Use the setvars Script with Windows*.

Include Files

The FPGA samples use many of the headers in the DirectProgramming/C++SYCL_FPGA/include folder.

Use Visual Studio Code* (VS Code) (Optional)

You can use Visual Studio Code* (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.

The basic steps to build and run a sample using VS Code include:

  1. Configure the oneAPI environment with the extension Environment Configurator for Intel® oneAPI Toolkits.
  2. Download a sample using the extension Code Sample Browser for Intel® oneAPI Toolkits.
  3. Open a terminal in VS Code (Terminal > New Terminal).
  4. Run the sample in the VS Code terminal using instructions for Linux.
  5. (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.

To learn more about the extensions and how to configure the oneAPI environment, see the Using Visual Studio Code with Intel® oneAPI Toolkits User Guide.

Use Integrated Development Environments (IDEs)

You can compile and run the sample using the Eclipse* IDE (Linux*) and Microsoft Visual Studio* (Windows*). For on using the IDE integration, see FPGA Workflows on Third-Party IDEs for Intel® oneAPI Toolkits.

Troubleshooting

If an error occurs when compiling a sample, you can get more details by running make with the VERBOSE=1 argument: make VERBOSE=1

If you receive an error message, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the Diagnostics Utility for Intel® oneAPI Toolkits User Guide for more information on using the utility.

Performance Disclaimers

Tests document performance of components on a particular test, in specific systems and may not reflect all publicly available security updates. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For complete information about performance and benchmark results, visit this page. See configuration disclosure for details. No product or component can be absolutely secure.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

© Intel Corporation.

Build and Run the Samples on Intel® DevCloud (Optional)

When running a sample in the Intel® DevCloud, you must specify the compute node (CPU, GPU, FPGA) and whether to run in batch or interactive mode.

You can specify a FPGA runtime node using a single line script similar to the following example.

qsub -I -l nodes=1:fpga_runtime:ppn=2 -d .
  • -I (upper case I) requests an interactive session.

  • -l nodes=1:fpga_runtime:ppn=2 (lower case L) assigns one full node.

  • -d . makes the current folder as the working directory for the task.

    Available Nodes Command Options
    FPGA Compile Time qsub -I -l nodes=1:fpga_compile:ppn=2 -d .
    FPGA Runtime qsub -I -l nodes=1:fpga_runtime:ppn=2 -d .
    GPU qsub -I -l nodes=1:gpu:ppn=2 -d .
    CPU qsub -I -l nodes=1:xeon:ppn=2 -d .

Note: For more information on how to specify compute nodes read, Launch and manage jobs in the Intel® DevCloud for oneAPI Documentation.

Only fpga_compile nodes support compiling to FPGA. When compiling for FPGA hardware, increase the job timeout to 24 hours.

Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the Intel® oneAPI Base Toolkit Get Started Guide.

Note: Since Intel® DevCloud for oneAPI includes the appropriate development environment already configured for you, you do not need to set environment variables.

Documentation