Arm Custom Instructions (ACI) extend Arm processors with application-specific instructions to optimize the performance of algorithms. ACI is currently implemented on Cortex-M33, Cortex-M52, Cortex-M55, and Cortex-M85 processors using the Custom Datapath Extension (CDE). It extends the processor with a custom compute pipeline for accelerators that avoids the overhead of the co-processor interface.
Note
The instruction set of the Cortex-M processor series is already comprehensive and delivers very good out-of-the-box performance with features like Helium for high efficient DSP and ML processing. However, in some cases custom defined instructions are beneficial. For example, when data inputs require bit manipulations that take several clock cycles. If this operation executes frequently, a single-cycle custom instruction improves performance and energy efficiency.
Imagine that you plan to accelerate a firmware with a set of custom instructions, but before proceeding to hardware design you would like to answer questions such as "How can we accelerate our algorithm?" There is usually more than one solution, and each solution corresponds to a set of custom instructions, so "which one is the best?".
This repository helps you to answer these questions by evaluating your code using software simulation before the time-consuming hardware design. It contains examples that you can adapt to your application requirements. These examples explain how ACI accelerated algorithms are developed by:
-
Defining a set of custom instructions utilizing ACI.
-
Adapt C/C++ source code to use ACI with intrinsic functions.
-
Extend Arm simulation models with custom instructions and estimate the performance gains.
-
Verify the ACI set before starting hardware design.
This repository does not include content related to hardware design. The details of the hardware interface for ACI are available in the Integration and Implementation Manual of the Cortex-M processor products. This document is available for licensees of the related Arm IP or under NDA (Non-disclosure agreement). If you wish to access this document, please contact the Arm technical support team.
Register here for the introduction webinar on April 8, 2025.
Arm Custom Instructions (ACI, also known as Custom Datapath Extensions in the architecture specification) is an optional feature to allow chip designers to add custom data processing operations in their silicon products. Potentially this can provide higher performance and energy efficiency in certain specialized data processing tasks. Technical details are covered in the Introduction to the Arm Custom Instructions / Custom Datapath Extension.
In addition the following resource pages are helpful:
- White-Paper Arm Custom Instructions: Enabling
Innovation and Greater Flexibility on Arm
- This paper was released shortly after Arm Custom Instructions was announced and provides a quick overview of what ACI is.
- White-Paper Innovate by Customized Instructions, but Without Fragmenting the Ecosystem
- This paper was released for Embedded World 2012 and describes the capability of Arm Custom Instructions and how compilation toolchains support this feature.
- Arm Custom Instructions on developer.arm.com
- Arm Custom Instructions webpage on the Arm website.
All C/C++ compilers that implement Arm C Language Extension (ACLE) support CDE intrinsic functions to execute ACI.
ACI access General Purpose Register (R0-R15) or the Vector Register file that contains 32-bit float register (S0-S31), 64-bit double registers (D0-D15), or 128-bit vector registers (Q0-Q7) as shown in the diagram below.
ACI Categories | Register Access | Notes |
---|---|---|
32-bit and 64-bit integer | R0-R15 | float8/16/32 values can be passed using a C union. |
32-bit single-precision float | S0-S31 | Available if FPU extension is implemented. |
64-bit double-precision float | D0-D15 | Available if FPU extension with double precision float is implemented. |
128-bit vector | Q0-Q7 | Available if MVE (Helium) is implemented. |
Introducing custom instructions is frequently an iterative process as algorithms might need adoptions to the underlying compute architecture. Exploring such algorithms on simulation models is an effective method to evaluate custom instructions on realistic compute workloads. This repository contains example projects that utilize this method of exploring ACI including the validation of the custom instruction extension. As these examples have a permissive open-source license they can be used a starting point for optimizing your own algorithms with ACI technology.
-
GPR implements a 32-bit integer population count custom instruction. The population count instruction is useful for many algorithms, for example to calculate the Hamming weight.
-
MVE implements 128-bit vector instructions to accelerate algorithms for image and pixel manipulation. The custom instructions are may be used in the Arm-2D image processing library and the example demonstrates the performance gain.
All popular C/C++ compilers for Arm Cortex-M processors implement Arm C Language Extension (ACLE) support CDE intrinsic functions to execute ACI. Code that is using ACI is portable between C/C++ compilers. Debuggers do not require extensions as ACI uses processor registers that are already visible in debug views.
Custom instructions do not require changes to existing software or middleware. For example, any RTOS kernel with Cortex-M processor support will also work with devices that extend the processor with a set of ACI.
The example projects in this repository use the following tools:
- Keil MDK: µVision or Keil Studio IDE for creating application software.
- CMSIS-Toolbox for command-line build.
- AVH-FVP simulation models for Cortex-M processors (uses Arm Fast Models).
- GCC Compiler and Make to translate plugins for AVH-FVP simulation models on Linux or Windows Hosts.
The repository uses GitHub Actions to generate the plugins and verify examples and tests.
Action | Description |
---|---|
build-plugins-linux.yml | Generate the AVH-FVP plugin extensions for the ACI examples. Download plugin artifact for Linux. |
build-plugins-windows.yml | Generate the AVH-FVP plugin extensions for the ACI examples. Download plugin artifact for Windows. |
GPR-test.yml | Validation of AVH-FVP plugin for GPR ACI extension. |
GRP-example.yml | Build and execution test for GPR example project. |
MVE-test.yml | Validation of AVH-FVP plugin for MVE ACI extension. |
MVE-example.yml | Build and execution test for MVE example project. |
- White-Paper Arm Custom Instructions: Enabling Innovation and Greater Flexibility on Arm
- White-Paper Innovate by Customized Instructions, but Without Fragmenting the Ecosystem
- Arm Custom Instructions on developer.arm.com
- Arm C Language Extension (ACLE) support CDE intrinsic functions to execute ACI
- Arm Fast Models - Plugin for CDE
The example projects in this repository are licensed under .
Please feel free to raise an issue on GitHub to report misbehavior (i.e. bugs) or start discussions about enhancements. This is your best way to interact directly with the maintenance team and the community.