Scripts from Neural network inference on Pytorch with tools like ONNX, TensorRT, nvFuser, TorchDynamo, Triton
-
Updated
Nov 15, 2022 - Jupyter Notebook
Scripts from Neural network inference on Pytorch with tools like ONNX, TensorRT, nvFuser, TorchDynamo, Triton
The goal of the project is to benchmark and optimize BERT inference using different backends—PyTorch eager mode, TorchDynamo (Inductor backend), and NVIDIA Triton Inference Server. We use GLUE SST-2 samples for evaluation and compare performance through profiling, kernel timing, and latency analysis.
Add a description, image, and links to the torchdynamo topic page so that developers can more easily learn about it.
To associate your repository with the torchdynamo topic, visit your repo's landing page and select "manage topics."