Release v0.3.2 · vllm-project/vllm

Major Changes

This version adds support for the OLMo and Gemma Model, as well as seed parameter.

Defensively copy sampling_params by @njhill in #2881
multi-LoRA as extra models in OpenAI server by @jvmncs in #2775
Add code-revision config argument for Hugging Face Hub by @mbm-ai in #2892
[Minor] Small fix to make distributed init logic in worker looks cleaner by @zhuohan123 in #2905
[Test] Add basic correctness test by @zhuohan123 in #2908
Support OLMo models. by @Isotr0py in #2832
Add warning to prevent changes to benchmark api server by @simon-mo in #2858
Fix vllm:prompt_tokens_total metric calculation by @ronensc in #2869
[ROCm] include gfx908 as supported by @jamestwhedbee in #2792
[FIX] Fix beam search test by @zhuohan123 in #2930
Make vLLM logging formatting optional by @Yard1 in #2877
Add metrics to RequestOutput by @Yard1 in #2876
Add Gemma model by @xiangxu-google in #2964
Upgrade transformers to v4.38.0 by @WoosukKwon in #2965
[FIX] Add Gemma model to the doc by @zhuohan123 in #2966
[ROCm] Upgrade transformers to v4.38.0 by @WoosukKwon in #2967
Support per-request seed by @njhill in #2514
Bump up version to v0.3.2 by @zhuohan123 in #2968

Full Changelog: v0.3.1...v0.3.2