Skip to content

Latest commit

 

History

History
366 lines (282 loc) · 18.8 KB

README.md

File metadata and controls

366 lines (282 loc) · 18.8 KB

SHARK

High Performance Machine Learning and Data Analytics for CPUs, GPUs, Accelerators and Heterogeneous Clusters

Nightly Release Validate torch-models on Shark Runtime

Communication Channels

Installation

Installation (Linux and macOS)

Setup a new pip Virtual Environment

This step sets up a new VirtualEnv for Python

python --version #Check you have 3.7->3.10 on Linux or 3.10 on macOS
python -m venv shark_venv
source shark_venv/bin/activate

# If you are using conda create and activate a new conda env

# Some older pip installs may not be able to handle the recent PyTorch deps
python -m pip install --upgrade pip

macOS Metal users please install https://sdk.lunarg.com/sdk/download/latest/mac/vulkan-sdk.dmg and enable "System wide install"

Install SHARK

This step pip installs SHARK and related packages on Linux Python 3.7, 3.8, 3.9, 3.10 and macOS Python 3.10

pip install nodai-shark -f https://github.com/nod-ai/SHARK/releases -f https://github.com/llvm/torch-mlir/releases -f https://github.com/nod-ai/shark-runtime/releases --extra-index-url https://download.pytorch.org/whl/nightly/cpu

If you are on an Intel macOS machine you need this workaround for an upstream issue.

Download and run Resnet50 sample

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/resnet50_script.py
#Install deps for test script
pip install --pre torch torchvision torchaudio tqdm pillow --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./resnet50_script.py --device="cpu"  #use cuda or vulkan or metal

Download and run BERT (MiniLM) sample

curl -O https://raw.githubusercontent.com/nod-ai/SHARK/main/shark/examples/shark_inference/minilm_jit.py
#Install deps for test script
pip install transformers torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python ./minilm_jit.py --device="cpu"  #use cuda or vulkan or metal
Source Installation

Check out the code

git clone https://github.com/nod-ai/SHARK.git

Setup your Python VirtualEnvironment and Dependencies

# Setup venv and install necessary packages (torch-mlir, nodLabs/Shark, ...).
./setup_venv.sh
source shark.venv/bin/activate

For example if you want to use Python3.10 and upstream IREE with TF Import tools you can use the environment variables like:

# PYTHON=python3.10 VENV_DIR=0617_venv IMPORTER=1 USE_IREE=1 ./setup_venv.sh 

If you are a Torch-mlir developer or an IREE developer and want to test local changes you can uninstall the provided packages with pip uninstall torch-mlir and / or pip uninstall iree-compiler iree-runtime and build locally with Python bindings and set your PYTHONPATH as mentioned here for IREE and here for Torch-MLIR.

Run a demo script

python -m  shark.examples.shark_inference.resnet50_script --device="cpu" # Use gpu | vulkan
# Or a pytest
pytest tank/tf/hf_masked_lm/albert-base-v2_test.py::AlbertBaseModuleTest::test_module_static_cpu
Testing

Run all model tests on CPU/GPU/VULKAN/Metal

pytest tank

# If on Linux for multithreading on CPU (faster results):
pytest tank -n auto

Running specific tests

# Run tests for a specific model:
pytest tank/<MODEL_NAME> #i.e., pytest tank/bert-base-uncased

# Run tests for a specific case:
pytest tank/<MODEL_NAME> -k "keyword" 
# i.e., pytest tank/bert-base-uncased/bert-base-uncased_test.py -k "static_gpu"

Run benchmarks on SHARK tank pytests and generate bench_results.csv with results.

(requires source installation with IMPORTER=1 ./setup_venv.sh)

pytest --benchmark tank
  
# Just do static GPU benchmarks for PyTorch tests:
pytest --benchmark tank --ignore-glob="_tf*" -k "static_gpu"
API Reference

Shark Inference API


from shark.shark_importer import SharkImporter

# SharkImporter imports mlir file from the torch, tensorflow or tf-lite module.

mlir_importer = SharkImporter(
    torch_module,
    (input),
    frontend="torch",  #tf, #tf-lite
)
torch_mlir, func_name = mlir_importer.import_mlir(tracing_required=True)

# SharkInference accepts mlir in linalg, mhlo, and tosa dialect.

from shark.shark_inference import SharkInference
shark_module = SharkInference(torch_mlir, func_name, device="cpu", mlir_dialect="linalg")
shark_module.compile()
result = shark_module.forward((input))

Example demonstrating running MHLO IR.

from shark.shark_inference import SharkInference
import numpy as np

mhlo_ir = r"""builtin.module  {
      func.func @forward(%arg0: tensor<1x4xf32>, %arg1: tensor<4x1xf32>) -> tensor<4x4xf32> {
        %0 = chlo.broadcast_add %arg0, %arg1 : (tensor<1x4xf32>, tensor<4x1xf32>) -> tensor<4x4xf32>
        %1 = "mhlo.abs"(%0) : (tensor<4x4xf32>) -> tensor<4x4xf32>
        return %1 : tensor<4x4xf32>
      }
}"""

arg0 = np.ones((1, 4)).astype(np.float32)
arg1 = np.ones((4, 1)).astype(np.float32)
shark_module = SharkInference(mhlo_ir, func_name="forward", device="cpu", mlir_dialect="mhlo")
shark_module.compile()
result = shark_module.forward((arg0, arg1))

Supported and Validated Models

PyTorch Models

Huggingface PyTorch Models

Hugging Face Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT 💚 (JIT) 💚 💚 💚
Albert 💚 (JIT) 💚 💚 💚
BigBird 💚 (AOT)
DistilBERT 💚 (JIT) 💚 💚 💚
GPT2 💔 (AOT)
MobileBert 💚 (JIT) 💚 💚 💚

Torchvision Models

TORCHVISION Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
AlexNet 💚 (Script) 💚 💚 💚
DenseNet121 💚 (Script)
MNasNet1_0 💚 (Script) 💚 💚 💚
MobileNetV2 💚 (Script) 💚 💚 💚
MobileNetV3 💚 (Script) 💚 💚 💚
Unet 💔 (Script)
Resnet18 💚 (Script) 💚 💚 💚
Resnet50 💚 (Script) 💚 💚 💚
Resnet101 💚 (Script) 💚 💚 💚
Resnext50_32x4d 💚 (Script) 💚 💚 💚
ShuffleNet_v2 💔 (Script)
SqueezeNet 💚 (Script) 💚 💚 💚
EfficientNet 💚 (Script)
Regnet 💚 (Script) 💚 💚 💚
Resnest 💔 (Script)
Vision Transformer 💚 (Script)
VGG 16 💚 (Script) 💚 💚
Wide Resnet 💚 (Script) 💚 💚 💚
RAFT 💔 (JIT)

For more information refer to MODEL TRACKING SHEET

PyTorch Training Models

Models Torch-MLIR lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT 💔 💔
FullyConnected 💚 💚
JAX Models

JAX Models

Models JAX-MHLO lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
DALL-E 💔 💔
FullyConnected 💚 💚
TFLite Models

TFLite Models

Models TOSA/LinAlg SHARK-CPU SHARK-CUDA SHARK-METAL
BERT 💔 💔
FullyConnected 💚 💚
albert 💚 💚
asr_conformer 💚 💚
bird_classifier 💚 💚
cartoon_gan 💚 💚
craft_text 💚 💚
deeplab_v3 💚 💚
densenet 💚 💚
east_text_detector 💚 💚
efficientnet_lite0_int8 💚 💚
efficientnet 💚 💚
gpt2 💚 💚
image_stylization 💚 💚
inception_v4 💚 💚
inception_v4_uint8 💚 💚
lightning_fp16 💚 💚
lightning_i8 💚 💚
lightning 💚 💚
magenta 💚 💚
midas 💚 💚
mirnet 💚 💚
mnasnet 💚 💚
mobilebert_edgetpu_s_float 💚 💚
mobilebert_edgetpu_s_quant 💚 💚
mobilebert 💚 💚
mobilebert_tf2_float 💚 💚
mobilebert_tf2_quant 💚 💚
mobilenet_ssd_quant 💚 💚
mobilenet_v1 💚 💚
mobilenet_v1_uint8 💚 💚
mobilenet_v2_int8 💚 💚
mobilenet_v2 💚 💚
mobilenet_v2_uint8 💚 💚
mobilenet_v3-large 💚 💚
mobilenet_v3-large_uint8 💚 💚
mobilenet_v35-int8 💚 💚
nasnet 💚 💚
person_detect 💚 💚
posenet 💚 💚
resnet_50_int8 💚 💚
rosetta 💚 💚
spice 💚 💚
squeezenet 💚 💚
ssd_mobilenet_v1 💚 💚
ssd_mobilenet_v1_uint8 💚 💚
ssd_mobilenet_v2_fpnlite 💚 💚
ssd_mobilenet_v2_fpnlite_uint8 💚 💚
ssd_mobilenet_v2_int8 💚 💚
ssd_mobilenet_v2 💚 💚
ssd_spaghettinet_large 💚 💚
ssd_spaghettinet_large_uint8 💚 💚
visual_wake_words_i8 💚 💚
TF Models

Tensorflow Models (Inference)

Hugging Face Models tf-mhlo lowerable SHARK-CPU SHARK-CUDA SHARK-METAL
BERT 💚 💚 💚 💚
albert-base-v2 💚 💚 💚 💚
DistilBERT 💚 💚 💚 💚
CamemBert 💚 💚 💚 💚
ConvBert 💚 💚 💚 💚
Deberta
electra 💚 💚 💚 💚
funnel
layoutlm 💚 💚 💚 💚
longformer
mobile-bert 💚 💚 💚 💚
remembert
tapas
flaubert 💚 💚 💚 💚
roberta 💚 💚 💚 💚
xlm-roberta 💚 💚 💚 💚
mpnet 💚 💚 💚 💚

Related Projects

IREE Project Channels
MLIR and Torch-MLIR Project Channels

License

nod.ai SHARK is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.