Minimal demo: Optimize PyTorch MLP for L4 GPUs via TensorRT.
git clone <repo> && cd tensorrt-l4-demopip install -r requirements.txtpython models/export_onnx.pypython tensorrt/build_engine.pypython benchmarks/benchmark.py→ 40%+ drop!
| Setup | Latency (ms) | Reduction |
|---|---|---|
| PyTorch | 0.40-0.60 | Baseline |
| TRT FP16 | 0.20-0.30 | 40-50% |
L4 Tips: Use g2-standard-4; monitor with nvidia-smi.