This repository has been archived by the owner on Oct 11, 2024. It is now read-only.
v0.2.0
Key Features
This release is based on vllm==0.4.0.post1
- New model architectures supported!
DbrxForCausalLM
,CohereForCausalLM
(Command-R),JAISLMHeadModel
,LlavaForConditionalGeneration
(experimental vision LM),OrionForCausalLM
,Qwen2MoeForCausalLM
,StableLmForCausalLM
,Starcoder2ForCausalLM
,XverseForCausalLM
- Automated benchmarking
- Code coverage reporting
- lm-evaluation-harness nightly accuracy testing
- Layerwise Profiling for the inference graph (#124)
What's Changed
- turn off single gpu scenario by @andy-neuma in #88
- Benchmarking : Absolute -> Relative imports by @varun-sundar-rabindranath in #85
- Benchmarking : update Gi_per_thread by @varun-sundar-rabindranath in #90
- Update README.md with sparsity and quantization explainers by @mgoin in #91
- Add notebooks for sparsegpt and marlin compression with nm-vllm by @mgoin in #94
- upstream sync 2024-03-04 by @andy-neuma in #89
- Update README.md by @robertgshaw2-neuralmagic in #96
- Formatting : Fix yapf by @varun-sundar-rabindranath in #101
- Lower unstructured sparsity threshold to 40% by @mgoin in #100
- Benchmarking : Misc updates by @varun-sundar-rabindranath in #95
- upstream merge sync 2024-03-11 by @andy-neuma in #108
- Add lm-eval comparison script by @mgoin in #99
- Benchmarks : Standardize benchmark result store by @varun-sundar-rabindranath in #87
- seed whl centric workflows by @andy-neuma in #116
- Benchmarking : Remote push job by @varun-sundar-rabindranath in #92
- reverted accidental commit to main by @robertgshaw2-neuralmagic in #119
- skipped test for nightly failure by @robertgshaw2-neuralmagic in #120
- Turned back on the Marlin tests by @robertgshaw2-neuralmagic in #121
- Benchmarking : Prepare for GHA benchmark UI by @varun-sundar-rabindranath in #122
- Upstream sync 2024 03 14 by @robertgshaw2-neuralmagic in #127
- Benchmark : Update benchmark configs for Nightly by @varun-sundar-rabindranath in #126
- Benchmark : Modify/Add workflows/actions for github-action-benchmark by @varun-sundar-rabindranath in #123
- Benchmark: fix nightly by @varun-sundar-rabindranath in #131
- Fix nightly - 03/18/2024 by @varun-sundar-rabindranath in #136
- Upstream sync 2024 03 18 by @robertgshaw2-neuralmagic in #134
- Update Dockerfile with extensions support by @mgoin in #107
- Benchmark : Turn-off nightly multi-gpu benchmarks temporarily by @varun-sundar-rabindranath in #130
- Benchmark Fix : Remove special tokens from warmup prompts by @varun-sundar-rabindranath in #140
- Delete .github/pull_request_template.md by @mgoin in #145
- Benchmarking : Update readme by @varun-sundar-rabindranath in #144
- Initial Layerwise Profiler by @LucasWilkinson in #124
- Benchmark Fix : Fix JSON decode error by @varun-sundar-rabindranath in #142
- Upstream sync 2024 03 24 by @robertgshaw2-neuralmagic in #143
- Benchmark : Fix remote push job by @varun-sundar-rabindranath in #129
- Benchmarks : Prune nightly benchmarks by @varun-sundar-rabindranath in #150
- Lock lm-evaluation-harness to commit 262f879 by @mgoin in #151
- Benchmarks : Copy benchmark results to EFS by @varun-sundar-rabindranath in #148
- update readme with nvcc threads option by @varun-sundar-rabindranath in #153
- Generate tarball along with wheel build, and upload both in a package to GH by @dhuangnm in #138
- switch to nightly whl's by @andy-neuma in #154
- whl centric workflow for "remote push" by @andy-neuma in #117
- remove low-workload benchmarks that are flaky by @varun-sundar-rabindranath in #156
- nightly patches by @andy-neuma in #160
- Upstream sync v0.4.0.post1 (merged with
upstream-v0.4.0.post1
) by @mgoin in #157 - Bump version to 0.2 by @mgoin in #165
New Contributors
Full Changelog: 0.1.0...0.2.0