Open
Description
Quantize Ultravox to fp8 and determine how this affects the model's inference performance as well as speed. This would entail
- adding quantization to ultravox/infer
- adding a flag to infer_tool for quantization
- running infer_tool for evaluation and summarizing the output.