AI_template_RVV_backend

AI_template_RVV_backend is based on AITemplate.

My Contributions

Implemented the paper: Efficient Column-Wise N:M Pruning on RISC-V CPU
Added a CPU backend, which was not previously supported
Developed custom operations including:
1. Sparse 2D convolution operators in CNHW layout
  - Generate tiled matrix multiplication functions, as proposed in our paper, using different tile sizes and LMUL values
  - Integrate with a fusion of im2col and packing operations, which are also proposed in our paper; all implementations of the fusion operations are defined in static/cpu/include/rvv_utils.h)
2. Dense 2D convolution operators in CNHW layout (generate code that utilizes custom XNNPACK neural network operators implemented by myself)
Enhanced profiling mechanisms to select optimal tile size and RISC-V Vector Length Multiplier (LMUL)
Extended AITemplate to support remote compilation and code execution on RISC-V devices

RVV Backend Integration Flow

The diagram below illustrates how the new RVV backend and code generation flow have been integrated into the AITemplate framework:

Legend:
🟦 Blue blocks — Custom components implemented in this project (including new XNNPACK operators)
⚪️ Gray blocks — Upstream components from third-party libraries (e.g., XNNPACK)
◽️ White blocks — Existing AITemplate components reused or extended

For details on the custom XNNPACK operators developed for RVV, please see the related repository:XNNPACK_RVV

Setup

Open python/aitemplate/utils/remote_send_receive_files.py, and set the following variables:

TARGET_USER
TARGET_IP
REMOTE_PROFILE_DIR
RUN_DIR

⚠️ Note: You must manually create the REMOTE_PROFILE_DIR and RUN_DIR directories on your remote device before proceeding.

Send the folder : python/aitemplate/utils/static/ to the RUN_DIR directory on your remote device.
Build 3rdParty 3rdparty/XNNPACK_RVV first
After the build completes, edit python/aitemplate/backend/rvv/target_def.py so that xnnpack_path points to your freshly built XNNPACK library.
Warning: the bare-metal cross-compiler riscv64-unknown-elf-gcc ships without libpthread, so multi-threading is unavailable; actual thread counts therefore depend on the device at run time. Consequently, AI_template_RVV_backend compiles and runs the program directly on the device.
Build AITemplate : Please note that Python 3.11.10 runs without any problems; newer Python versions may have compatibility issues with dependent packages.

When cloning the code, please use the following command to also clone the submodules:
```
git clone --recursive https://github.com/wewe5215/AI_template_RVV_backend.git
```

build AITemplate:

cd python
python setup.py bdist_wheel
pip install dist/*.whl --force-reinstall

Compiler requirement: compile the generated C++ with Clang ≥ 17.0.2; older versions lack several RVV v0.12 intrinsics used by the backend.

Important Notices

There will be four instances of remote access. Please check the content sent to the remote device before entering your password, for the sake of computer security. The text in parentheses indicates the file and location of the code that sends the remote access request:

Set up ssh_client (python/aitemplate/utils/remote_send_receive_files.py)
Send profile code to the remote device via scp (python/aitemplate/backend/builder.py, line 1038)
Send generated function code to the remote device via scp (python/aitemplate/backend/builder.py, line 1086)
Send metadata for code execution via scp (in each example folder’s test_correctness.py and benchmark_ait_rvv.py)

If you have any questions, feel free to open an issue. I will respond as soon as possible.
Currently, the CPU backend only supports f32. Support for f16 will be added in the future.

Steps for Replicating the End-to-End Experiment from Our Paper

Complete the Setup

Make sure you have followed all the steps in the Setup section.

Navigate to the Example Folder and choose a folder corresponding to the model you want to evaluate::

example/01_resnet-50_pruned_RVV -> ResNet 18, 34, 50, 101, 152
example/11_DenseNet_pruned -> for DenseNet121
example/12_MobileNet_pruned -> for MobileNet-V2

Run the Benchmark Script:

Execute the following script with your desired batch size:
benchmark_ait_rvv.py --batch-size {batch_size you want}
This will generate a profile summary and the benchmark result.

Retrain the Pruned Model:

Use the profile summary to guide retraining of the pruned model.
For ResNet models, retraining code is provided in: example/01_resnet-50_pruned_RVV/retrain_code_resnet
For DenseNet121 models, retraining code is provided in: example/11_DenseNet_pruned/densenet121_re_train_column_wise_pruning.py
Detailed training recipes and hyperparameters are described in the Performance Evaluation Section of our paper.

Other Notice:

If you want to use the CPU backend, set the IS_CPU_BACKEND flag before compiling or running your model:

  import importlib
  dt = importlib.import_module("aitemplate.testing.detect_target")
  dt.IS_CPU_BACKEND = True

NHWC Layout Support: The dense CPU backend also supports the NHWC data layout. For the models discussed in the paper, this may result in generated code that calls low-level XNNPACK operators. These operators are compatible with various hardware backends.
Remote Compilation: To enable remote compilation and execution, pass remote_compile=True to the compile_model,function. Otherwise, it defaults to False
```
module = compile_model(y, target, "./tmp", model_name, remote_compile=True)
```

License

AITemplate is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 948 Commits
.circleci		.circleci
.github/workflows		.github/workflows
3rdparty		3rdparty
docker		docker
docs		docs
examples		examples
fx2ait		fx2ait
licenses		licenses
python		python
static		static
tests		tests
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RVV_backend.png		RVV_backend.png
default.nix		default.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI_template_RVV_backend

My Contributions

RVV Backend Integration Flow

Setup

Important Notices

Steps for Replicating the End-to-End Experiment from Our Paper

License

About

Uh oh!

Releases

Packages

Languages

License

wewe5215/AI_template_RVV_backend

Folders and files

Latest commit

History

Repository files navigation

AI_template_RVV_backend

My Contributions

RVV Backend Integration Flow

Setup

Important Notices

Steps for Replicating the End-to-End Experiment from Our Paper

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages