AI_template_RVV_backend is based on AITemplate.
- Implemented the paper: Efficient Column-Wise N:M Pruning on RISC-V CPU
- Added a CPU backend, which was not previously supported
- Developed custom operations including:
- Sparse 2D convolution operators in CNHW layout
- Generate tiled matrix multiplication functions, as proposed in our paper, using different tile sizes and LMUL values
- Integrate with a fusion of im2col and packing operations, which are also proposed in our paper; all implementations of the fusion operations are defined in
static/cpu/include/rvv_utils.h
)
- Dense 2D convolution operators in CNHW layout (generate code that utilizes custom XNNPACK neural network operators implemented by myself)
- Sparse 2D convolution operators in CNHW layout
- Enhanced profiling mechanisms to select optimal tile size and RISC-V Vector Length Multiplier (LMUL)
- Extended AITemplate to support remote compilation and code execution on RISC-V devices
The diagram below illustrates how the new RVV backend and code generation flow have been integrated into the AITemplate framework:
Legend:
🟦 Blue blocks — Custom components implemented in this project (including new XNNPACK operators)
⚪️ Gray blocks — Upstream components from third-party libraries (e.g., XNNPACK)
◽️ White blocks — Existing AITemplate components reused or extended
For details on the custom XNNPACK operators developed for RVV, please see the related repository:XNNPACK_RVV
- Open
python/aitemplate/utils/remote_send_receive_files.py
, and set the following variables:
TARGET_USER
TARGET_IP
REMOTE_PROFILE_DIR
RUN_DIR
⚠️ Note: You must manually create theREMOTE_PROFILE_DIR
andRUN_DIR
directories on your remote device before proceeding.
- Send the folder :
python/aitemplate/utils/static/
to theRUN_DIR
directory on your remote device. - Build 3rdParty
3rdparty/XNNPACK_RVV
first - After the build completes, edit
python/aitemplate/backend/rvv/target_def.py
so thatxnnpack_path
points to your freshly built XNNPACK library. - Warning: the bare-metal cross-compiler
riscv64-unknown-elf-gcc
ships withoutlibpthread
, so multi-threading is unavailable; actual thread counts therefore depend on the device at run time. Consequently, AI_template_RVV_backend compiles and runs the program directly on the device. - Build AITemplate : Please note that Python 3.11.10 runs without any problems; newer Python versions may have compatibility issues with dependent packages.
- When cloning the code, please use the following command to also clone the submodules:
git clone --recursive https://github.com/wewe5215/AI_template_RVV_backend.git
- build AITemplate:
cd python python setup.py bdist_wheel pip install dist/*.whl --force-reinstall
- Compiler requirement: compile the generated C++ with Clang ≥ 17.0.2; older versions lack several RVV v0.12 intrinsics used by the backend.
- There will be four instances of remote access. Please check the content sent to the remote device before entering your password, for the sake of computer security. The text in parentheses indicates the file and location of the code that sends the remote access request:
- Set up ssh_client (
python/aitemplate/utils/remote_send_receive_files.py
) - Send profile code to the remote device via scp (
python/aitemplate/backend/builder.py
, line 1038) - Send generated function code to the remote device via scp (
python/aitemplate/backend/builder.py
, line 1086) - Send metadata for code execution via scp (in each example folder’s
test_correctness.py
andbenchmark_ait_rvv.py
)
- If you have any questions, feel free to open an issue. I will respond as soon as possible.
- Currently, the CPU backend only supports f32. Support for f16 will be added in the future.
- Complete the Setup
- Make sure you have followed all the steps in the Setup section.
- Navigate to the Example Folder and choose a folder corresponding to the model you want to evaluate::
example/01_resnet-50_pruned_RVV
-> ResNet 18, 34, 50, 101, 152example/11_DenseNet_pruned
-> for DenseNet121example/12_MobileNet_pruned
-> for MobileNet-V2
- Run the Benchmark Script:
- Execute the following script with your desired batch size:
benchmark_ait_rvv.py --batch-size {batch_size you want}
- This will generate a profile summary and the benchmark result.
- Retrain the Pruned Model:
- Use the profile summary to guide retraining of the pruned model.
- For ResNet models, retraining code is provided in:
example/01_resnet-50_pruned_RVV/retrain_code_resnet
- For DenseNet121 models, retraining code is provided in:
example/11_DenseNet_pruned/densenet121_re_train_column_wise_pruning.py
- Detailed training recipes and hyperparameters are described in the Performance Evaluation Section of our paper.
- Other Notice:
- If you want to use the CPU backend, set the
IS_CPU_BACKEND
flag before compiling or running your model:import importlib dt = importlib.import_module("aitemplate.testing.detect_target") dt.IS_CPU_BACKEND = True
- NHWC Layout Support: The dense CPU backend also supports the NHWC data layout. For the models discussed in the paper, this may result in generated code that calls low-level XNNPACK operators. These operators are compatible with various hardware backends.
- Remote Compilation: To enable remote compilation and execution, pass
remote_compile=True
to thecompile_model
,function. Otherwise, it defaults toFalse
module = compile_model(y, target, "./tmp", model_name, remote_compile=True)
AITemplate is licensed under the Apache 2.0 License.