Skip to content

Latest commit

 

History

History

inference

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

SHARK Triton Backend

The triton backend for shark.

Build

Install SHARK

git clone https://github.com/nod-ai/SHARK.git
# skip above step if dshark is already installed
cd SHARK/inference

install dependancies

apt-get install patchelf rapidjson-dev python3-dev
git submodule update --init

update the submodules of iree

cd thirdparty/shark-runtime
git submodule update --init

Next, make the backend and install it

cd ../..
mkdir build && cd build
cmake -DTRITON_ENABLE_GPU=ON \
-DIREE_HAL_DRIVER_CUDA=ON \
-DIREE_TARGET_BACKEND_CUDA=ON \
-DMLIR_ENABLE_CUDA_RUNNER=ON \
-DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install \
-DTRITON_BACKEND_REPO_TAG=r22.02 \
-DTRITON_CORE_REPO_TAG=r22.02 \
-DTRITON_COMMON_REPO_TAG=r22.02 ..
make install

Incorporating into Triton

There are much more in depth explenations for the following steps in triton's documentation: https://github.com/triton-inference-server/server/blob/main/docs/compose.md#triton-with-unsupported-and-custom-backends

There should be a file at /build/install/backends/dshark/libtriton_dshark.so. You will need to copy it into your triton server image.
More documentation is in the link above, but to create the docker image, you need to run the compose.py command in the triton-backend server repo

To first build your image, clone the tritonserver repo.

git clone https://github.com/triton-inference-server/server.git

then run compose.py to build a docker compose file

cd server
python3 compose.py --repoagent checksum --dry-run

Because dshark is a third party backend, you will need to manually modify the Dockerfile.compose to include the dshark backend. To do this, in the Dockerfile.compose file produced, copy this line. the dshark backend will be located in the build folder from earlier under /build/install/backends

COPY /path/to/build/install/backends/dshark /opt/tritonserver/backends/dshark

Next run

docker build -t tritonserver_custom -f Dockerfile.compose .
docker run -it --gpus=1 --net=host -v/path/to/model_repos:/models  tritonserver_custom:latest tritonserver --model-repository=/models

where path/to/model_repos is where you are storing the models you want to run

if your not using gpus, omit --gpus=1

docker run -it  --net=host -v/path/to/model_repos:/models  tritonserver_custom:latest tritonserver --model-repository=/models

Setting up a model

to include a model in your backend, add a directory with your model name to your model repository directory. examples of models can be seen here: https://github.com/triton-inference-server/backend/tree/main/examples/model_repos/minimal_models

make sure to adjust the input correctly in the config.pbtxt file, and save a vmfb file under 1/model.vmfb

CUDA

if you're having issues with cuda, make sure your correct drivers are installed, and that nvidia-smi works, and also make sure that the nvcc compiler is on the path.