Triton VeriSilicon Device Backend

Triton backend plugin for VeriSilicon (vsi) GPGPU/NPU device.

Build

Requirements

Install LLVM and Clang toolchain, version 17 is recommended. See LLVM Debian/Ubuntu packages.
Install VPEX PyTorch backend with pip3 install vpex.
Download prebuilt vsi driver SDK from this repository.

Build wheel

export VSI_ZEN_COMPILER_PATH=${path/to/zen-compiler}
export VSI_ZEN_TC_BRIDGE_PATH=${path/to/libZenTCBridge.so}
cd triton_vsi_backend
pip3 wheel --no-build-isolation .

Develop

pip3 install --no-build-isolation --editable .

Usage

Setup

Set these environment variables:

VSI_SDK_DIR: The path to VeriSilicon GPGPU/NPU SDK dir.
LD_LIBRARY_PATH: The path for linker to find vsi driver libraries, usually set to ${VSI_SDK_DIR}/drivers.
CC: The C compiler, clang is recommended.

To use the vsi backend:

import vpex
import triton
import triton_vsi_backend

# ... your triton kernel and other code

Use with TorchInductor

Currently PyTorch has no out-of-tree registration mechanism for adding a new triton backend for the Inductor. In order to use our plugin with Inductor, you need to modify the source code in torch/utils/_triton.py:

# Locate the `has_triton` method:

@functools.lru_cache(None)
def has_triton() -> bool:
    # ...
    # Around line 82:
    triton_supported_devices = {
        "cuda": cuda_extra_check,
        "xpu": _return_true,
        "cpu": cpu_extra_check,
        "vsi": _return_true, # Add this entry to the dict.
    }
    # ...

If your platform has CUDA devices, you need to set env var CUDA_VISIBLE_DEVICES="" to disable them, otherwise there will be conflict when running some Inductor passes.

Dump standalone launcher for debug

For single triton kernel, You can dump a C++ standalone launcher that is capable to run the compiled kernel without the triton python runtime.

Set the kernel metadata parameter dump_standalone=True and env var VSI_DUMP_DIR to dump the standalone launcher source code and compile artifacts (IRs, binaries, etc) to dir ${VSI_DUMP_DIR}/${kernel_name}.

Compile the dumped standalone launcher:

clang++ -Og -glldb -std=c++17 -stdlib=libc++ -fuse-ld=lld ${kernel_name}_standalone.cpp -o ${kernel_name}_standalone -I ${VSI_SDK_INCLUDE_DIR} -L ${VSI_SDK_LIB_DIR} -lOpenCL

Run standalone launcher:

# Prepare your input kernel args (both tensors and scalars) to ./inputs/arg_{i}.npy
export LD_LIBRARY_PATH=${VSI_SDK_LIB_DIR}
./${kernel_name}_standalone -g grid_x,grid_y,grid_z
# If runs OK, the output args (only tensors) are dumped to ./outputs/arg_{i}.npy

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
src/triton_vsi_backend		src/triton_vsi_backend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Triton VeriSilicon Device Backend

Build

Requirements

Build wheel

Develop

Usage

Setup

Use with TorchInductor

Dump standalone launcher for debug

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

VeriSilicon/triton-vsi-backend

Folders and files

Latest commit

History

Repository files navigation

Triton VeriSilicon Device Backend

Build

Requirements

Build wheel

Develop

Usage

Setup

Use with TorchInductor

Dump standalone launcher for debug

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages