You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
New Features:
DeepSparse Pipelines v2 was introduced, enabling more complex pipelines to be represented. Text Generation (compatible with Hugging Face Transformers) and Image Classification pipelines have been refactored to the v2 format. (#1324, #1385, #1460, #1596, #1502, #1460, #1626)
OpenAI Server compatibility added on top of Pipelines v2. (#1445, #1477)
deepsparse.evaluate APIs and CLIs added with plugins for perplexity and lm-eval-harness for LLM evaluations. (#1596)
An example was added demonstrating how to use LLMPerf for benchmarking DeepSparse LLM servers. (#1502)
Continuous batching support has been added for text generation pipelines and inference server pathways, enabling inference over multiple text streams at once. (#1569, #1571)
Changes:
Exposed sequence_length for greater control over text generation pipelines. (#1518)
deepsparse.analyze functionality has been updated to work properly with LLMs. (#1324)
The logging and timing infrastructure for Pipelines expanded to enable more thorough tracking and logging, in addition to furthering support for integrations with Prometheus and other standard logging platforms. (#1614)
UX improved for text generation pipelines to more closely match Hugging Face Transformers pipelines. (#1583, #1584, #1590, #1592, #1598)
Resolved Issues:
Compile time for dense LLMs is no longer very slow.
Text generation pipeline bug fixes: corrected sampling logic errors and inappropriate in-place logits mutation resulting in incorrect answers for LLMs when using sampling. (#1406, #1414)
KV cache was fixed for improper handling of the kv_cache input while using external KV cache management, which resulted in inaccurate model inference for ONNX Runtime comparison pathways. (#1337)
Benchmarking runs for LLMs with internal KV cache no longer crash or report inaccurate numbers. (#1512, #1514)
SciPy dependencies were removed to address issues for CV pipelines where they would fail on import of scipy and crash. (#1604, #1602)
Known Issues:
OPT models produce incorrect outputs and are no longer supported.
Streaming support is limited within the DeepSparse Pipeline v2 framework for tasks other than text generation.