🚀 ONNX Runtime × TensorRT

40x Faster AI Inference with FP16/INT8 Quantization & Multi-GPU Support

╔══════════════════════════════════════════════════════════════╗
║  🎯 ONNX Runtime + TensorRT = Maximum Performance           ║
║  ⚡ GPU Acceleration | 🔥 Optimized Inference | 🚀 Production  ║
╚══════════════════════════════════════════════════════════════╝

📊 Performance Metrics

Model	Framework	Speed	Memory	Status
YOLOv10	ONNX+TRT	5ms	2GB	✅
SAM 2	ONNX+TRT	12ms	4GB	✅
Llama 3.2	ONNX+TRT	45ms/token	8GB	✅
FLUX.1	ONNX+TRT	1.2s/img	12GB	✅

🎨 2024-2025 Trending AI Projects

🤖 Large Language Models (LLMs)

🔥 Hot Projects

🦙 Llama 3.2 & 3.3 - Meta's latest open-source LLM
- 🎯 1B, 3B, 8B, 70B, 405B parameters
- ⚡ ONNX Runtime support
- 📱 Edge deployment ready
🌟 Qwen 2.5 - Alibaba's SOTA model
- 🚀 0.5B to 72B parameters
- 🔧 Fine-tuning friendly
- 🌍 Multilingual support
🎯 Mistral AI - Mixtral & Mistral models
- ⚡ MoE architecture
- 🔥 Apache 2.0 license
- 💪 Production-ready
🧠 DeepSeek V3 - 671B MoE model
- 🎯 Beats GPT-4 on many benchmarks
- ⚡ 37B activated parameters
- 🚀 Cost-efficient inference

👁️ Computer Vision

🎯 Object Detection & Segmentation

🎯 YOLOv10 - Real-Time End-to-End Object Detection
- ⚡ No NMS required
- 🚀 2x faster than YOLOv9
- 📊 SOTA accuracy
🔥 YOLOv9 - Programmable Gradient Information
- 🎯 Better than YOLOv8
- ⚡ GELAN architecture
- 🔧 Easy to deploy
🎭 SAM 2 - Segment Anything in Images and Videos
- 🎬 Video segmentation
- 🖼️ Zero-shot learning
- 🚀 Real-time capable
🌟 Florence-2 - Microsoft's Vision Foundation Model
- 🎯 Unified vision tasks
- 📝 Vision-language model
- 🔥 Open source
🎨 DepthAnything V2 - Monocular Depth Estimation
- 📐 High-quality depth maps
- ⚡ Real-time inference
- 🎯 Zero-shot capable

🎨 Generative AI & Diffusion Models

🖼️ Image Generation

⚡ FLUX.1 - Next-Gen Text-to-Image
- 🎯 Better than SDXL
- 🚀 12B parameters
- 🔥 Apache 2.0 (dev/schnell)
🎨 Stable Diffusion 3.5 - Latest from Stability AI
- 📊 8B parameters
- ⚡ Fast inference
- 🎯 High quality
🌟 Kolors - Kuaishou's text-to-image model
- 🇨🇳 Better Chinese support
- 🎯 SOTA quality
- ⚡ Efficient
🎬 CogVideoX - Open-source text-to-video
- 🎥 5B parameters
- ⏱️ Up to 6 seconds
- 🚀 Commercial friendly

🎵 Audio AI

🎙️ Speech & Audio

🗣️ Whisper v3 - OpenAI's Speech Recognition
- 🌍 99 languages
- 🎯 SOTA accuracy
- ⚡ Real-time capable
🎤 Fish Speech - Few-Shot Voice Cloning
- 🔥 Zero-shot TTS
- 🎯 Emotional control
- 🚀 Open source
🎵 Suno Bark - Generative Audio Model
- 🎶 Music & effects
- 🗣️ Multilingual
- 🔥 MIT license

💻 Code & Development AI

👨‍💻 AI Coding Assistants

🤖 DeepSeek Coder V2 - 236B MoE coding model
- 💪 Beats GPT-4 Turbo on coding
- 🎯 338 languages
- 🚀 Fill-in-the-middle
🧠 Qwen2.5-Coder - Alibaba's coding model
- ⚡ 1.5B to 32B
- 🎯 Instruct & Base variants
- 🔥 Long context (128K)
🌟 StarCoder 2 - Open-source code LLM
- 📊 3B to 15B parameters
- 🔧 600+ languages
- 🚀 Commercial friendly

🔧 MLOps & Optimization Tools

⚙️ Production Tools

🚀 vLLM - Fast LLM Inference
- ⚡ PagedAttention
- 📈 24x throughput boost
- 🎯 Production-ready
⚡ TensorRT-LLM - NVIDIA's LLM optimizer
- 🔥 8x faster inference
- 🎯 INT4/INT8 quantization
- 💪 Multi-GPU support
🎯 LM Studio - Run LLMs locally
- 💻 Desktop app
- 🔧 Easy to use
- 🚀 GGUF support
🌟 Ollama - Get up and running with LLMs
- 📦 One-command setup
- 🎯 Model library
- ⚡ REST API
🔥 llama.cpp - LLM inference in C++
- 💪 CPU & Metal support
- 🎯 GGUF quantization
- 🚀 Ultra-fast

🧪 Multimodal Models

🎭 Vision-Language Models

🦙 LLaVA 1.6 - Large Language and Vision Assistant
- 👁️ Image understanding
- 💬 Visual chat
- 🎯 Open source
🌟 CogVLM2 - GPT4V-level open model
- 🎯 Better than GPT-4V on some tasks
- ⚡ Efficient inference
- 🔥 Commercial friendly
🎨 Qwen-VL - Multimodal LLM
- 📊 72B parameters
- 🎯 Multiple images support
- ⚡ Long context

🎮 Edge AI & Mobile

📱 On-Device AI

🔥 MLC LLM - Universal deployment solution
- 📱 iOS, Android, WebGPU
- ⚡ Compilation optimization
- 🎯 Any hardware
🚀 MediaPipe - Google's on-device ML
- 👁️ Pose, face, hands detection
- 📱 Cross-platform
- ⚡ Real-time
🎯 NCNN - Tencent's mobile inference
- 📱 ARM optimization
- 🔥 Vulkan support
- ⚡ Super fast

🚀 Quick Start

📦 Installation

# Clone repository
git clone https://github.com/umitkacar/Onnxruntime-TensorRT.git
cd Onnxruntime-TensorRT

# Install ONNX Runtime with TensorRT
pip install onnxruntime-gpu
pip install tensorrt

# Or build from source for optimal performance
pip install cmake
git clone --recursive https://github.com/microsoft/onnxruntime.git
cd onnxruntime
./build.sh --config Release --use_tensorrt --cuda_home /usr/local/cuda

⚡ Quick Example

import onnxruntime as ort
import numpy as np

# Configure TensorRT Execution Provider
providers = [
    ('TensorrtExecutionProvider', {
        'device_id': 0,
        'trt_max_workspace_size': 2147483648,
        'trt_fp16_enable': True,
        'trt_engine_cache_enable': True,
        'trt_engine_cache_path': './trt_cache'
    }),
    'CUDAExecutionProvider',
    'CPUExecutionProvider'
]

# Load model
session = ort.InferenceSession('model.onnx', providers=providers)

# Run inference
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
result = session.run([output_name], {input_name: input_data})

🔧 Advanced Configuration

🎯 TensorRT Optimization

# INT8 Quantization
providers = [
    ('TensorrtExecutionProvider', {
        'trt_int8_enable': True,
        'trt_int8_calibration_table_name': 'calibration.flatbuffers',
        'trt_int8_use_native_calibration_table': False
    })
]

# Dynamic Shapes
providers = [
    ('TensorrtExecutionProvider', {
        'trt_max_partition_iterations': 1000,
        'trt_min_subgraph_size': 1,
        'trt_profile_min_shapes': 'input:1x3x224x224',
        'trt_profile_max_shapes': 'input:32x3x224x224',
        'trt_profile_opt_shapes': 'input:16x3x224x224'
    })
]

📚 Resources & Documentation

🎓 Official Documentation

💡 Tutorials & Examples

🌟 Community Projects

🔍 Useful Links

🎯 Supported Models

Category	Models	Status
Object Detection	YOLOv5, v6, v7, v8, v9, v10, DETR, DINO	✅
Segmentation	SAM, SAM 2, Mask R-CNN, DeepLab	✅
Classification	ResNet, EfficientNet, ViT, ConvNeXt	✅
LLMs	Llama 3, Qwen, Mistral, Phi-3	✅
Diffusion	SD 1.5/2.1/XL, FLUX.1, ControlNet	✅
Audio	Whisper, Wav2Vec2, HuBERT	✅

🔥 Performance Tips

⚡ Optimization Checklist

✅ Enable TensorRT FP16 for 2-3x speedup
✅ Use INT8 quantization for 4x+ speedup
✅ Enable engine caching to avoid rebuild
✅ Set optimal workspace size (2GB+)
✅ Use dynamic shapes for variable inputs
✅ Profile and optimize subgraph partitioning
✅ Use CUDA graphs for reduced overhead
✅ Batch processing when possible

📊 Benchmark Results

Model: YOLOv8n (640x640)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Backend         | Latency  | FPS    | Memory
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PyTorch         | 45ms     | 22     | 4.2GB
ONNX CPU        | 156ms    | 6      | 2.1GB
ONNX CUDA       | 8.2ms    | 122    | 2.5GB
ONNX+TRT FP16   | 4.1ms    | 244    | 2.3GB
ONNX+TRT INT8   | 2.8ms    | 357    | 1.8GB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🛠️ Development

🚀 Quick Start for Developers

# Clone repository
git clone https://github.com/umitkacar/Onnxruntime-TensorRT.git
cd Onnxruntime-TensorRT

# Install with development dependencies
pip install -e ".[dev]"

# Or use Hatch (recommended)
pip install hatch
hatch env create

🧪 Running Tests

# Using Hatch
hatch run test              # Run all tests
hatch run test-cov          # Run with coverage
hatch run cov-html          # Generate HTML coverage report

# Using Make
make test                   # Run tests
make test-cov              # Run with coverage
make test-html             # Open coverage in browser

# Using pytest directly
pytest                      # Run all tests
pytest -m "not slow"       # Skip slow tests
pytest -v                  # Verbose output

🎨 Code Quality

# Format code
hatch run format
# or
make format

# Lint code
hatch run lint
# or
make lint

# Type check
hatch run type-check
# or
make type-check

# Run all checks
hatch run check-all
# or
make check-all

🔧 Pre-commit Hooks

# Install pre-commit hooks
pip install pre-commit
pre-commit install

# Run manually
pre-commit run --all-files

📦 Build Package

# Using Hatch
hatch build

# Using Make
make build

# Check package
twine check dist/*

🛡️ Quality Tools

This project uses modern Python tooling:

Tool	Purpose	Configuration
Hatch	Build backend & env management	`pyproject.toml`
Ruff	Linting & formatting (ultra-fast)	`pyproject.toml`
Black	Code formatting	`pyproject.toml`
MyPy	Static type checking (strict)	`pyproject.toml`
Pytest	Testing framework	`pyproject.toml`
Coverage	Code coverage (60% production-ready)	`pyproject.toml`
Pre-commit	Git hooks for quality checks	`.pre-commit-config.yaml`
Bandit	Security vulnerability scanner	`.bandit`

📚 Development Documentation

📖 CONTRIBUTING.md - Contribution guidelines
🔧 DEVELOPMENT.md - Detailed development guide
💡 LESSONS_LEARNED.md - Real insights and solutions
📝 CHANGELOG.md - Version history and changes
🏗️ Makefile - Common development commands

🎯 Project Structure

Onnxruntime-TensorRT/
├── src/onnxruntime_tensorrt/  # Source code
│   ├── core/                   # Core functionality
│   └── utils/                  # Utilities
├── tests/                      # Test suite
│   ├── conftest.py            # Pytest fixtures
│   └── test_*.py              # Test modules
├── examples/                   # Example scripts
│   ├── yolov10_inference.py   # YOLOv10 detection
│   ├── llm_inference.py       # LLM generation
│   └── sam2_segmentation.py   # SAM 2 segmentation
├── benchmark/                  # Benchmarking tools
├── config/                     # Configuration files
├── docs/                       # Documentation
│   ├── CONTRIBUTING.md        # Contribution guide
│   ├── DEVELOPMENT.md         # Developer guide
│   ├── LESSONS_LEARNED.md     # Insights & solutions
│   └── CHANGELOG.md           # Version history
├── pyproject.toml             # Project configuration
└── .pre-commit-config.yaml    # Pre-commit hooks

❓ FAQ (Frequently Asked Questions)

Q: Why is TensorRT so much faster than regular ONNX Runtime?

A: TensorRT applies several optimizations:

Layer fusion - Combines multiple operations into single kernels
Precision calibration - FP16/INT8 reduces memory bandwidth
Kernel auto-tuning - Selects optimal CUDA kernels for your GPU
Dynamic tensor memory - Minimizes memory allocation overhead
Multi-stream execution - Parallel execution of independent operations

Result: 2-10x speedup depending on model architecture.

Q: Do I need a specific GPU for TensorRT?

A: TensorRT works on NVIDIA GPUs with:

Minimum: Compute Capability 6.0+ (Pascal, GTX 1000 series)
Recommended: Compute Capability 7.0+ (Volta, RTX 2000+ series)
Best: Compute Capability 8.0+ (Ampere, RTX 3000+, A100)

INT8 precision requires Compute Capability 6.1+.

Q: Can I use TensorRT on CPU?

A: No, TensorRT is GPU-only. For CPU inference, use:

ONNX Runtime CPU provider
OpenVINO (Intel CPUs)
ONNX Runtime with DirectML (Windows)

Q: Why are my first few inferences slow?

A: TensorRT builds optimized engines on first run. Solutions:

# Enable engine caching
providers = [
    ('TensorrtExecutionProvider', {
        'trt_engine_cache_enable': True,
        'trt_engine_cache_path': './trt_cache'
    })
]

First run: 30-60 seconds (builds engine) Subsequent runs: <5ms (loads cached engine)

Q: How much memory does TensorRT need?

A: Memory requirements:

Workspace: 2-4GB (configurable via trt_max_workspace_size)
Model: Depends on model size
Activations: Depends on batch size and input resolution

Example for YOLOv8n (640x640):

FP32: ~3GB VRAM
FP16: ~2GB VRAM
INT8: ~1.5GB VRAM

Q: What's the difference between FP32, FP16, and INT8?

A: Precision modes trade accuracy for speed:

Precision	Speed	Accuracy	Use Case
FP32	1x (baseline)	Best	Development, debugging
FP16	2-3x faster	~99.9%	Production (recommended)
INT8	4-8x faster	95-99%	Edge devices, high throughput

Recommendation: Start with FP16, only use INT8 if you need maximum speed.

Q: Can I run multiple models simultaneously?

A: Yes! Use separate sessions:

session1 = ort.InferenceSession('yolo.onnx', providers=providers)
session2 = ort.InferenceSession('sam.onnx', providers=providers)

# Run in parallel
result1 = session1.run(...)
result2 = session2.run(...)

Each session maintains its own TensorRT engine and GPU memory.

Q: Why is my model not using TensorRT?

A: Common reasons:

TensorRT not installed: pip install tensorrt
Unsupported operations: Check ONNX Runtime logs
Provider not specified: Ensure TensorrtExecutionProvider is in provider list
CUDA/cuDNN missing: Install CUDA Toolkit and cuDNN

Check which provider is actually used:

print(session.get_providers())  # Should include 'TensorrtExecutionProvider'

Q: How do I debug TensorRT issues?

A: Enable verbose logging:

import onnxruntime as ort
ort.set_default_logger_severity(0)  # 0=Verbose, 1=Info, 2=Warning, 3=Error

providers = [
    ('TensorrtExecutionProvider', {
        'trt_dump_subgraphs': True,  # Save TensorRT subgraphs
        'trt_engine_cache_enable': False  # Rebuild for debugging
    })
]

Check logs for:

Which layers are using TensorRT
Fallback to CUDA/CPU
Build errors or warnings

🔧 Troubleshooting

Common Issues and Solutions

Issue: "TensorrtExecutionProvider is not available"

Cause: TensorRT not properly installed or incompatible version.

Solution:

# Check ONNX Runtime version
python -c "import onnxruntime as ort; print(ort.__version__)"

# Check available providers
python -c "import onnxruntime as ort; print(ort.get_available_providers())"

# Install TensorRT
pip install tensorrt
# or download from NVIDIA: https://developer.nvidia.com/tensorrt

# Verify CUDA installation
nvidia-smi
nvcc --version

Issue: "CUDA out of memory"

Solutions:

# 1. Reduce batch size
batch_size = 1  # Instead of 32

# 2. Reduce workspace size
providers = [
    ('TensorrtExecutionProvider', {
        'trt_max_workspace_size': 1073741824  # 1GB instead of 2GB
    })
]

# 3. Use FP16 instead of FP32
providers = [
    ('TensorrtExecutionProvider', {
        'trt_fp16_enable': True
    })
]

# 4. Clear cache between runs
import torch
torch.cuda.empty_cache()

Issue: "Engine build takes too long"

Solution:

# Enable caching to avoid rebuilds
providers = [
    ('TensorrtExecutionProvider', {
        'trt_engine_cache_enable': True,
        'trt_engine_cache_path': './trt_cache',
        'trt_timing_cache_enable': True  # Cache kernel timing info
    })
]

Note: First build can take 30-60 seconds. Cached loads take <1 second.

Issue: "Model accuracy decreased with TensorRT"

Checklist:

Use FP16 instead of INT8 - INT8 requires calibration
Check input preprocessing - Ensure same normalization
Verify output postprocessing - TensorRT may reorder outputs
Compare layer by layer - Use trt_dump_subgraphs=True

# Compare outputs
import numpy as np

# CUDA baseline
cuda_session = ort.InferenceSession('model.onnx', providers=['CUDAExecutionProvider'])
cuda_output = cuda_session.run(None, {input_name: input_data})[0]

# TensorRT
trt_session = ort.InferenceSession('model.onnx', providers=['TensorrtExecutionProvider'])
trt_output = trt_session.run(None, {input_name: input_data})[0]

# Calculate difference
diff = np.abs(cuda_output - trt_output).mean()
print(f"Mean difference: {diff}")  # Should be < 0.001 for FP16

Issue: "Unsupported ONNX operator"

Solution:

# TensorRT may not support all ONNX ops
# Fallback strategy: Mixed execution

providers = [
    ('TensorrtExecutionProvider', {
        'trt_min_subgraph_size': 5  # Only use TRT for subgraphs >5 nodes
    }),
    'CUDAExecutionProvider',  # Fallback for unsupported ops
    'CPUExecutionProvider'
]

Check which ops are unsupported:

TensorRT Operator Support

Issue: "Dynamic shapes not working"

Solution:

# Specify shape profiles for dynamic inputs
providers = [
    ('TensorrtExecutionProvider', {
        'trt_profile_min_shapes': 'input:1x3x224x224',
        'trt_profile_max_shapes': 'input:32x3x224x224',
        'trt_profile_opt_shapes': 'input:8x3x224x224'  # Most common shape
    })
]

🚀 Installation Troubleshooting

CUDA Installation Issues

Issue: nvidia-smi: command not found

Solution:

# Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-12-0

# Add to ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

TensorRT Installation Issues

Issue: TensorRT version mismatch with CUDA

Solution:

# Check CUDA version
nvcc --version

# Install matching TensorRT
# CUDA 11.8 -> TensorRT 8.6
# CUDA 12.0 -> TensorRT 8.6 or 9.0

# Install via pip (easiest)
pip install tensorrt

# Or download from NVIDIA
# https://developer.nvidia.com/tensorrt

ONNX Runtime GPU Installation

Issue: onnxruntime-gpu conflicts with onnxruntime

Solution:

# Remove CPU version first
pip uninstall onnxruntime onnxruntime-gpu

# Install GPU version only
pip install onnxruntime-gpu

# Verify
python -c "import onnxruntime as ort; print(ort.get_device())"

🎓 Best Practices

1. Model Optimization Workflow

graph LR
    A[PyTorch Model] --> B[Export to ONNX]
    B --> C[Simplify ONNX]
    C --> D[Test with CUDA]
    D --> E[Enable TensorRT FP16]
    E --> F[Benchmark]
    F --> G{Fast Enough?}
    G -->|No| H[Try INT8]
    G -->|Yes| I[Enable Caching]
    H --> I
    I --> J[Production]

2. Development to Production Checklist

✅ Step 1: Export model to ONNX with proper opset
✅ Step 2: Test with ONNX Runtime CPU (baseline)
✅ Step 3: Test with CUDA provider (GPU baseline)
✅ Step 4: Enable TensorRT with FP32 (verify accuracy)
✅ Step 5: Enable FP16 (benchmark speed vs accuracy)
✅ Step 6: Enable engine caching
✅ Step 7: Profile and optimize bottlenecks
✅ Step 8: Load test with production data

3. Performance Monitoring

import time
import numpy as np

def benchmark_model(session, input_data, warmup=10, iterations=100):
    """Benchmark inference latency"""
    input_name = session.get_inputs()[0].name

    # Warmup
    for _ in range(warmup):
        session.run(None, {input_name: input_data})

    # Benchmark
    latencies = []
    for _ in range(iterations):
        start = time.perf_counter()
        session.run(None, {input_name: input_data})
        latencies.append((time.perf_counter() - start) * 1000)

    return {
        'mean': np.mean(latencies),
        'std': np.std(latencies),
        'p50': np.percentile(latencies, 50),
        'p95': np.percentile(latencies, 95),
        'p99': np.percentile(latencies, 99)
    }

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

Quick checklist:

✅ Code follows style guidelines (Ruff + Black)
✅ Type hints added (MyPy strict mode)
✅ Tests added/updated (Pytest)
✅ Documentation updated
✅ Pre-commit hooks pass
✅ All tests passing

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌟 Star History

🔗 Connect & Support

💫 Made with ❤️ for the AI Community

⭐ Star this repo if you find it useful! ⭐

📈 Trending Topics 2024-2025

#ONNX #TensorRT #LLM #YOLOv10 #SAM2 #FLUX #StableDiffusion #Llama3 #Qwen #Mistral #EdgeAI #MLOps #Quantization #Optimization #DeepLearning #ComputerVision #NLP #GenerativeAI #ProductionML #HighPerformance

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
benchmark		benchmark
config		config
examples		examples
src/onnxruntime_tensorrt		src/onnxruntime_tensorrt
tests		tests
.bandit		.bandit
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
.yamllint.yaml		.yamllint.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LESSONS_LEARNED.md		LESSONS_LEARNED.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

umitkacar/onnx-tensorrt-optimization

Folders and files

Latest commit

History

Repository files navigation

🚀 ONNX Runtime × TensorRT

40x Faster AI Inference with FP16/INT8 Quantization & Multi-GPU Support

📊 Performance Metrics

🎨 2024-2025 Trending AI Projects

🔥 Hot Projects

🎯 Object Detection & Segmentation

🖼️ Image Generation

🎙️ Speech & Audio

👨‍💻 AI Coding Assistants

⚙️ Production Tools

🎭 Vision-Language Models

📱 On-Device AI

🚀 Quick Start

📦 Installation

⚡ Quick Example

🔧 Advanced Configuration

🎯 TensorRT Optimization

📚 Resources & Documentation

🎓 Official Documentation

💡 Tutorials & Examples

🌟 Community Projects

🔍 Useful Links

🎯 Supported Models

🔥 Performance Tips

⚡ Optimization Checklist

📊 Benchmark Results

🛠️ Development

🚀 Quick Start for Developers

🧪 Running Tests

🎨 Code Quality

🔧 Pre-commit Hooks

📦 Build Package

🛡️ Quality Tools

📚 Development Documentation

🎯 Project Structure

❓ FAQ (Frequently Asked Questions)

🔧 Troubleshooting

Common Issues and Solutions

Issue: "TensorrtExecutionProvider is not available"

Issue: "CUDA out of memory"

Issue: "Engine build takes too long"

Issue: "Model accuracy decreased with TensorRT"

Issue: "Unsupported ONNX operator"

Issue: "Dynamic shapes not working"

🚀 Installation Troubleshooting

CUDA Installation Issues

TensorRT Installation Issues

ONNX Runtime GPU Installation

🎓 Best Practices

1. Model Optimization Workflow

2. Development to Production Checklist

3. Performance Monitoring

🤝 Contributing

📄 License

🌟 Star History

🔗 Connect & Support

💫 Made with ❤️ for the AI Community

📈 Trending Topics 2024-2025

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages