Skip to content

DJLServing v0.27.0 Release

Compare
Choose a tag to compare
@xyang16 xyang16 released this 15 Apr 21:03
· 643 commits to master since this release

Key Changes

  • Large Model Inference Containers 0.27.0 release
    • DeepSpeed container
      • Added DBRX and Gemma model support.
      • Provided general performance optimization.
      • Added new performance enhancing features support like Speculative Decoding.
    • TensorRT-LLM container
      • Upgraded to TensorRT-LLM 0.8.0
    • Transformers NeuronX container
      • Upgraded to Transformers NeuronX 2.18.0
  • Multi-Adapter LoRA Support
    • Provided multi-adapter inference functionality in LMI DLC.
  • CX Usability Enhancements
    • Provided a seamless migration experience across different LMI DLCs.
    • Implemented the Low code No code experience.
    • Supported OpenAI compatible chat completions API.

Enhancement

Known Issues

  • TensorRT-LLM container
    • CodeLLAMA TP8 compilation sometimes failed
    • Mistral 7B and Mixtral 8x-7B has correctness issues for compilation under TP4 and TP8, please do TP1 or TP2 to mitigate the correctness issues

Bug Fixes

Documentation

CI/CD

New Contributors

Full Changelog: v0.26.0...v0.27.0