Release v0.2.0 · neuralmagic/nm-vllm

Key Features

This release is based on vllm==0.4.0.post1

New model architectures supported! DbrxForCausalLM, CohereForCausalLM (Command-R), JAISLMHeadModel, LlavaForConditionalGeneration (experimental vision LM), OrionForCausalLM, Qwen2MoeForCausalLM, StableLmForCausalLM, Starcoder2ForCausalLM, XverseForCausalLM
Automated benchmarking
Code coverage reporting
lm-evaluation-harness nightly accuracy testing
Layerwise Profiling for the inference graph (#124)

What's Changed

turn off single gpu scenario by @andy-neuma in #88
Benchmarking : Absolute -> Relative imports by @varun-sundar-rabindranath in #85
Benchmarking : update Gi_per_thread by @varun-sundar-rabindranath in #90
Update README.md with sparsity and quantization explainers by @mgoin in #91
Add notebooks for sparsegpt and marlin compression with nm-vllm by @mgoin in #94
upstream sync 2024-03-04 by @andy-neuma in #89
Update README.md by @robertgshaw2-neuralmagic in #96
Formatting : Fix yapf by @varun-sundar-rabindranath in #101
Lower unstructured sparsity threshold to 40% by @mgoin in #100
Benchmarking : Misc updates by @varun-sundar-rabindranath in #95
upstream merge sync 2024-03-11 by @andy-neuma in #108
Add lm-eval comparison script by @mgoin in #99
Benchmarks : Standardize benchmark result store by @varun-sundar-rabindranath in #87
seed whl centric workflows by @andy-neuma in #116
Benchmarking : Remote push job by @varun-sundar-rabindranath in #92
reverted accidental commit to main by @robertgshaw2-neuralmagic in #119
skipped test for nightly failure by @robertgshaw2-neuralmagic in #120
Turned back on the Marlin tests by @robertgshaw2-neuralmagic in #121
Benchmarking : Prepare for GHA benchmark UI by @varun-sundar-rabindranath in #122
Upstream sync 2024 03 14 by @robertgshaw2-neuralmagic in #127
Benchmark : Update benchmark configs for Nightly by @varun-sundar-rabindranath in #126
Benchmark : Modify/Add workflows/actions for github-action-benchmark by @varun-sundar-rabindranath in #123
Benchmark: fix nightly by @varun-sundar-rabindranath in #131
Fix nightly - 03/18/2024 by @varun-sundar-rabindranath in #136
Upstream sync 2024 03 18 by @robertgshaw2-neuralmagic in #134
Update Dockerfile with extensions support by @mgoin in #107
Benchmark : Turn-off nightly multi-gpu benchmarks temporarily by @varun-sundar-rabindranath in #130
Benchmark Fix : Remove special tokens from warmup prompts by @varun-sundar-rabindranath in #140
Delete .github/pull_request_template.md by @mgoin in #145
Benchmarking : Update readme by @varun-sundar-rabindranath in #144
Initial Layerwise Profiler by @LucasWilkinson in #124
Benchmark Fix : Fix JSON decode error by @varun-sundar-rabindranath in #142
Upstream sync 2024 03 24 by @robertgshaw2-neuralmagic in #143
Benchmark : Fix remote push job by @varun-sundar-rabindranath in #129
Benchmarks : Prune nightly benchmarks by @varun-sundar-rabindranath in #150
Lock lm-evaluation-harness to commit 262f879 by @mgoin in #151
Benchmarks : Copy benchmark results to EFS by @varun-sundar-rabindranath in #148
update readme with nvcc threads option by @varun-sundar-rabindranath in #153
Generate tarball along with wheel build, and upload both in a package to GH by @dhuangnm in #138
switch to nightly whl's by @andy-neuma in #154
whl centric workflow for "remote push" by @andy-neuma in #117
remove low-workload benchmarks that are flaky by @varun-sundar-rabindranath in #156
nightly patches by @andy-neuma in #160
Upstream sync v0.4.0.post1 (merged with upstream-v0.4.0.post1) by @mgoin in #157
Bump version to 0.2 by @mgoin in #165

New Contributors

@dhuangnm made their first contribution in #138

Full Changelog: 0.1.0...0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0

Key Features

What's Changed

New Contributors

Contributors