This directory contains self-documented examples to illustrate how to make use of the DeepSparse Engine.
For instructions on how to run each example, either check the script README or run them with -h
.
Open a Pull Request to contribute your own examples.
Notebook | Description |
---|---|
Benchmark and ONNX Model Correctness | Comparing predictions and benchmark performance between DeepSparse Engine and ONNXRuntime. |
Hugging Face Transformers | Serving, benchmarking, and running NLP models from Hugging Face. |
YOLOv3 and YOLOv5 | Serving, benchmarking, and running annotation inferences with YOLOv3 and YOLOv5 models. |
Image Classification | How to use image classification models from SparseZoo to perform inference and benchmarking with the DeepSparse Engine. |
Object Detection | How to use object detection models from SparseZoo to perform inference and benchmarking with the DeepSparse Engine. |
Instance Segmentation | How to use an optimized YOLACT model and the DeepSparse Engine to perform real-time instance segmentation. |
AWS Serverless Integration | How to deploy a DeepSparse pipeline for batch or real-time inference on select serverless services. |
AWS Sagemaker Integration | How to deploy a DeepSparse inference server on SageMaker. |
Google Cloud Run | How to deploy a DeepSparse inference server on Cloud Run. |
Google Kubernetes Engine | How to deploy a DeepSparse inference server on GKE. |
SparseStream | Deploying 2 sparse transformers for classifying Finance tweets in a real-time Twitter stream. |
SparseServer.UI | A Streamlit app for deploying the DeepSparse Server to compare the latency and accuracy of sparse BERT models. |
Twitter Sentiment Analysis | Example of scraping, processing, and classifying Twitter data using the DeepSparse Engine for 10x faster performance on CPUs. |
Flask Model Server | Simple model server and client example, showing how to use the DeepSparse Engine as an inference backend for a real-time inference server. |