SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
-
Updated
Nov 21, 2025 - Python
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
a simple pipline of int8 quantization based on tensorrt.
Trustworthy onboard satellite AI in PyTorch→ONNX→INT8 with calibration, telemetry, and a PhiSat-2 EO tile-filter demo.
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
development quantization framework
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Add a description, image, and links to the int8 topic page so that developers can more easily learn about it.
To associate your repository with the int8 topic, visit your repo's landing page and select "manage topics."