Skip to content

wewe5215/AI_template_RVV_backend

AI_template_RVV_backend

AI_template_RVV_backend is based on AITemplate.

My Contributions

  • Implemented the paper: Efficient Column-Wise N:M Pruning on RISC-V CPU
  • Added a CPU backend, which was not previously supported
  • Developed custom operations including:
    1. Sparse 2D convolution operators in CNHW layout
      • Generate tiled matrix multiplication functions, as proposed in our paper, using different tile sizes and LMUL values
      • Integrate with a fusion of im2col and packing operations, which are also proposed in our paper; all implementations of the fusion operations are defined in static/cpu/include/rvv_utils.h)
    2. Dense 2D convolution operators in CNHW layout (generate code that utilizes custom XNNPACK neural network operators implemented by myself)
  • Enhanced profiling mechanisms to select optimal tile size and RISC-V Vector Length Multiplier (LMUL)
  • Extended AITemplate to support remote compilation and code execution on RISC-V devices

RVV Backend Integration Flow

The diagram below illustrates how the new RVV backend and code generation flow have been integrated into the AITemplate framework: RVV Backend Flow

Legend:
🟦 Blue blocks — Custom components implemented in this project (including new XNNPACK operators)
⚪️ Gray blocks — Upstream components from third-party libraries (e.g., XNNPACK)
◽️ White blocks — Existing AITemplate components reused or extended

For details on the custom XNNPACK operators developed for RVV, please see the related repository:XNNPACK_RVV

Setup

  1. Open python/aitemplate/utils/remote_send_receive_files.py, and set the following variables:
  • TARGET_USER
  • TARGET_IP
  • REMOTE_PROFILE_DIR
  • RUN_DIR

⚠️ Note: You must manually create the REMOTE_PROFILE_DIR and RUN_DIR directories on your remote device before proceeding.

  1. Send the folder : python/aitemplate/utils/static/ to the RUN_DIR directory on your remote device.
  2. Build 3rdParty 3rdparty/XNNPACK_RVV first
  3. After the build completes, edit python/aitemplate/backend/rvv/target_def.py so that xnnpack_path points to your freshly built XNNPACK library.
  4. Warning: the bare-metal cross-compiler riscv64-unknown-elf-gcc ships without libpthread, so multi-threading is unavailable; actual thread counts therefore depend on the device at run time. Consequently, AI_template_RVV_backend compiles and runs the program directly on the device.
  5. Build AITemplate : Please note that Python 3.11.10 runs without any problems; newer Python versions may have compatibility issues with dependent packages.
  • When cloning the code, please use the following command to also clone the submodules:
    git clone --recursive https://github.com/wewe5215/AI_template_RVV_backend.git
    
  • build AITemplate:
    cd python
    python setup.py bdist_wheel
    pip install dist/*.whl --force-reinstall
    
  1. Compiler requirement: compile the generated C++ with Clang ≥ 17.0.2; older versions lack several RVV v0.12 intrinsics used by the backend.

Important Notices

  1. There will be four instances of remote access. Please check the content sent to the remote device before entering your password, for the sake of computer security. The text in parentheses indicates the file and location of the code that sends the remote access request:
  • Set up ssh_client (python/aitemplate/utils/remote_send_receive_files.py)
  • Send profile code to the remote device via scp (python/aitemplate/backend/builder.py, line 1038)
  • Send generated function code to the remote device via scp (python/aitemplate/backend/builder.py, line 1086)
  • Send metadata for code execution via scp (in each example folder’s test_correctness.py and benchmark_ait_rvv.py)
  1. If you have any questions, feel free to open an issue. I will respond as soon as possible.
  2. Currently, the CPU backend only supports f32. Support for f16 will be added in the future.

Steps for Replicating the End-to-End Experiment from Our Paper

  1. Complete the Setup
  • Make sure you have followed all the steps in the Setup section.
  1. Navigate to the Example Folder and choose a folder corresponding to the model you want to evaluate::
  • example/01_resnet-50_pruned_RVV -> ResNet 18, 34, 50, 101, 152
  • example/11_DenseNet_pruned -> for DenseNet121
  • example/12_MobileNet_pruned -> for MobileNet-V2
  1. Run the Benchmark Script:
  • Execute the following script with your desired batch size:
  • benchmark_ait_rvv.py --batch-size {batch_size you want}
  • This will generate a profile summary and the benchmark result.
  1. Retrain the Pruned Model:
  • Use the profile summary to guide retraining of the pruned model.
  • For ResNet models, retraining code is provided in: example/01_resnet-50_pruned_RVV/retrain_code_resnet
  • For DenseNet121 models, retraining code is provided in: example/11_DenseNet_pruned/densenet121_re_train_column_wise_pruning.py
  • Detailed training recipes and hyperparameters are described in the Performance Evaluation Section of our paper.
  1. Other Notice:
  • If you want to use the CPU backend, set the IS_CPU_BACKEND flag before compiling or running your model:
      import importlib
      dt = importlib.import_module("aitemplate.testing.detect_target")
      dt.IS_CPU_BACKEND = True
  • NHWC Layout Support: The dense CPU backend also supports the NHWC data layout. For the models discussed in the paper, this may result in generated code that calls low-level XNNPACK operators. These operators are compatible with various hardware backends.
  • Remote Compilation: To enable remote compilation and execution, pass remote_compile=True to the compile_model,function. Otherwise, it defaults to False
    module = compile_model(y, target, "./tmp", model_name, remote_compile=True)

License

AITemplate is licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published