[Performance]: Transformers 4.45.1 slows down outlines
guided decoding
#9032
Labels
performance
Performance-related issues
outlines
guided decoding
#9032
Report of performance regression
I noticed that guided decoding was a bit slower on newer builds of vllm, but couldn't track down a commit that caused a performance regression. Instead it looks like upgrading transformers from
4.44.2
to4.45.1
causes the issue.I ran a small artillery test with requests using guided decoding, using the code from commit
4f1ba0844
. This is the last commit beforemllama
support was added, so it's the last point where vllm will work with both transformers versions4.44.2
and4.45.1
. VLLM was run with 1xA100 gpu, using modelmistralai/Mistral-7B-Instruct-v0.2
The results with
4.44.2
installed:and with
4.45.1
installed:The slowdown looks pretty significant to me 🐌🐌🐌
I wasn't able to get the vllm profiling to work to try to dig in at all, unfortunately it kept crashing with encoding errors whenever I ran any requests with guided decoding. So, I don't know if this is a problem with vllm, with outlines, or with transformers. But given that outlines hasn't been updated in quite a while and
sglang
went and forked it- I'm not sure if this is worth investigating as is or if it'll be overcome by events.Anybody have ideas about what could be going wrong?
The scripts I ran:
artillery.yaml
payloads.csv
(obviously very scientific 😉 )
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: