A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
-
Updated
Aug 13, 2018 - Python
A simple tool that applies structure-level optimizations (e.g. Quantization) to a TensorFlow model
ncnn is a high-performance neural network inference framework optimized for the mobile platform
PyTorch Mobile: Android examples of usage in applications
PyTorch Mobile: iOS examples
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Batch normalization fusion for PyTorch
Optimize layers structure of Keras model to reduce computation time
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
A constrained expectation-maximization algorithm for feasible graph inference.
Batch estimation on Lie groups
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.
Modified inference engine for quantized convolution using product quantization
Batch Partitioning for Multi-PE Inference with TVM (2020)
The Tensor Algebra SuperOptimizer for Deep Learning
cross-platform modular neural network inference library, small and efficient
Improving Natural Language Processing tasks using BERT-based models
A compilation of various ML and DL models and ways to optimize the their inferences.
YOLOV8 - Object detection
Add a description, image, and links to the inference-optimization topic page so that developers can more easily learn about it.
To associate your repository with the inference-optimization topic, visit your repo's landing page and select "manage topics."