Closed
Description
ETA Monday April 22
- [Misc] Bump transformers to latest version #4176
- Support eos_token_id from generation_config.json #4182 (addressing [Usage]: Llama 3 8B Instruct Inference #4180)
- [Core] add an option to log every function call to for debugging hang/crash in distributed inference #4079
- [Bugfix] Fix CustomAllreduce pcie nvlink topology detection (#3974) #4159
- [Bugfix] Fix LoRA loading check #4138
- [Bug]: OpenAI API Server always reports 0 tokens/s #4209
- Performance Regression between v0.4.0 and v0.4.1 #4210
- [Core][Distributed] use absolute path for library file #4271
- [Misc] Reduce supported Punica dtypes #4304 (otherwise we cannot upload to PyPI :(