v3.2.0
Important changes
-
BREAKING CHANGE: Lots of modifications around tool calling. Tool calling now respects fully OpenAI return results (arguments return type is a string instead of a real JSON object). Lots of improvements around the tool calling and side effects fixed.
-
Added Gemma 3 support.
What's Changed
- fix(neuron): explicitly install toolchain by @dacorvo in #3072
- Only add token when it is defined. by @Narsil in #3073
- Making sure Olmo (transformers backend) works. by @Narsil in #3074
- Making
tool_calls
a vector. by @Narsil in #3075 - Nix: add
openai
to impure shell for integration tests by @danieldk in #3081 - Update
--max-batch-total-tokens
description by @alvarobartt in #3083 - Fix tool call2 by @Narsil in #3076
- Nix: the launcher needs a Python env with Torch for GPU detection by @danieldk in #3085
- Add request parameters to OTel span for
/v1/chat/completions
endpoint by @aW3st in #3000 - Add qwen2 multi lora layers support by @EachSheep in #3089
- Add modules_to_not_convert in quantized model by @jiqing-feng in #3053
- Small test and typing fixes by @danieldk in #3078
- hotfix: qwen2 formatting by @danieldk in #3093
- Pr 3003 ci branch by @drbh in #3007
- Update the llamacpp backend by @angt in #3022
- Fix qwen vl by @Narsil in #3096
- Update README.md by @celsowm in #3095
- Fix tool call3 by @Narsil in #3086
- Add gemma3 model by @mht-sharma in #3099
- Fix tool call4 by @Narsil in #3094
- Update neuron backend by @dacorvo in #3098
- Preparing relase 3.2.0 by @Narsil in #3100
- Try to fix on main CI color. by @Narsil in #3101
New Contributors
- @EachSheep made their first contribution in #3089
- @jiqing-feng made their first contribution in #3053
Full Changelog: v3.1.1...v3.2.0