Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

andy/bump main to v0.3.2 #49

Closed
wants to merge 113 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
6b7de1a
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
hongxiayang Jan 26, 2024
3a0e1fc
Support for Stable LM 2 (#2598)
dakotamahan-stability Jan 26, 2024
390b495
Don't build punica kernels by default (#2605)
pcmoritz Jan 26, 2024
beb89f6
AWQ: Up to 2.66x higher throughput (#2566)
casper-hansen Jan 27, 2024
220a476
Use head_dim in config if exists (#2622)
xiangxu-google Jan 27, 2024
3801700
Implement custom all reduce kernels (#2192)
hanzhi713 Jan 27, 2024
5f036d2
[Minor] Fix warning on Ray dependencies (#2630)
WoosukKwon Jan 27, 2024
f8ecb84
Speed up Punica compilation (#2632)
WoosukKwon Jan 28, 2024
89be30f
Small async_llm_engine refactor (#2618)
andoorve Jan 28, 2024
7d64841
Update Ray version requirements (#2636)
simon-mo Jan 28, 2024
9090bf0
Support FP8-E5M2 KV Cache (#2279)
zhaoyang-star Jan 29, 2024
b72af8f
Fix error when tp > 1 (#2644)
zhaoyang-star Jan 29, 2024
1b20639
No repeated IPC open (#2642)
hanzhi713 Jan 29, 2024
ea8489f
ROCm: Allow setting compilation target (#2581)
rlrs Jan 29, 2024
5d60def
DeepseekMoE support with Fused MoE kernel (#2453)
zwd003 Jan 30, 2024
ab40644
Fused MOE for Mixtral (#2542)
pcmoritz Jan 30, 2024
d79ced3
Fix 'Actor methods cannot be called directly' when using `--engine-us…
HermitSun Jan 30, 2024
4f65af0
Add swap_blocks unit tests (#2616)
sh1ng Jan 30, 2024
bbe9bd9
[Minor] Fix a small typo (#2672)
pcmoritz Jan 30, 2024
105a40f
[Minor] Fix false warning when TP=1 (#2674)
WoosukKwon Jan 30, 2024
3dad944
Add quantized mixtral support (#2673)
WoosukKwon Jan 31, 2024
1af090b
Bump up version to v0.3.0 (#2656)
zhuohan123 Jan 31, 2024
d69ff0c
Fixes assertion failure in prefix caching: the lora index mapping sho…
sighingnow Jan 31, 2024
c664b0e
fix some bugs (#2689)
zspo Jan 31, 2024
89efcf1
[Minor] Fix test_cache.py CI test failure (#2684)
pcmoritz Jan 31, 2024
d0d93b9
Add unit test for Mixtral MoE layer (#2677)
pcmoritz Jan 31, 2024
93b38be
Refactor Prometheus and Add Request Level Metrics (#2316)
robertgshaw2-redhat Jan 31, 2024
cd9e60c
Add Internlm2 (#2666)
Feb 1, 2024
923797f
Fix compile error when using rocm (#2648)
zhaoyang-star Feb 1, 2024
b9e96b1
fix python 3.8 syntax (#2716)
simon-mo Feb 1, 2024
bb8c697
Update README for meetup slides (#2718)
simon-mo Feb 1, 2024
c410f5d
Use revision when downloading the quantization config file (#2697)
Pernekhan Feb 1, 2024
96b6f47
Remove hardcoded `device="cuda" ` to support more devices (#2503)
jikunshang Feb 1, 2024
0e163fc
Fix default length_penalty to 1.0 (#2667)
zspo Feb 1, 2024
4abf633
Add one example to run batch inference distributed on Ray (#2696)
c21 Feb 2, 2024
5ed704e
docs: fix langchain (#2736)
mspronesti Feb 4, 2024
51cd22c
set&get llm internal tokenizer instead of the TokenizerGroup (#2741)
dancingpipi Feb 4, 2024
5a6c81b
Remove eos tokens from output by default (#2611)
zcnrex Feb 4, 2024
c9b45ad
Require triton >= 2.1.0 (#2746)
whyiug Feb 5, 2024
72d3a30
[Minor] Fix benchmark_latency script (#2765)
WoosukKwon Feb 5, 2024
56f738a
[ROCm] Fix some kernels failed unit tests (#2498)
hongxiayang Feb 5, 2024
b92adec
Set local logging level via env variable (#2774)
gardberg Feb 5, 2024
2ccee3d
[ROCm] Fixup arch checks for ROCM (#2627)
dllehr-amd Feb 5, 2024
f0d4e14
Add fused top-K softmax kernel for MoE (#2769)
WoosukKwon Feb 6, 2024
ed70c70
modelscope: fix issue when model parameter is not a model id but path…
liuyhwangyh Feb 6, 2024
fe6d09a
[Minor] More fix of test_cache.py CI test failure (#2750)
LiuXiaoxuanPKU Feb 6, 2024
c81dddb
[ROCm] Fix build problem resulted from previous commit related to FP8…
hongxiayang Feb 7, 2024
931746b
Add documentation on how to do incremental builds (#2796)
pcmoritz Feb 7, 2024
65b89d1
[Ray] Integration compiled DAG off by default (#2471)
rkooo567 Feb 8, 2024
3711811
Disable custom all reduce by default (#2808)
WoosukKwon Feb 8, 2024
0580aab
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-atte…
hongxiayang Feb 11, 2024
4ca2c35
Add documentation section about LoRA (#2834)
pcmoritz Feb 12, 2024
5638364
Refactor 2 awq gemm kernels into m16nXk32 (#2723)
zcnrex Feb 12, 2024
a4211a4
Serving Benchmark Refactoring (#2433)
ywang96 Feb 13, 2024
f964493
[CI] Ensure documentation build is checked in CI (#2842)
simon-mo Feb 13, 2024
5c976a7
Refactor llama family models (#2637)
esmeetu Feb 13, 2024
ea35600
Revert "Refactor llama family models (#2637)" (#2851)
pcmoritz Feb 13, 2024
a463c33
Use CuPy for CUDA graphs (#2811)
WoosukKwon Feb 13, 2024
317b29d
Remove Yi model definition, please use `LlamaForCausalLM` instead (#2…
pcmoritz Feb 13, 2024
2a543d6
Add LoRA support for Mixtral (#2831)
tterrysun Feb 13, 2024
7eacffd
Migrate InternLMForCausalLM to LlamaForCausalLM (#2860)
pcmoritz Feb 14, 2024
0c48b37
Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2…
pcmoritz Feb 14, 2024
7e45107
[Fix] Fix memory profiling when GPU is used by multiple processes (#2…
WoosukKwon Feb 14, 2024
87069cc
Fix docker python version (#2845)
NikolaBorisov Feb 14, 2024
4efbac6
Migrate AquilaForCausalLM to LlamaForCausalLM (#2867)
esmeetu Feb 14, 2024
25e86b6
Don't use cupy NCCL for AMD backends (#2855)
WoosukKwon Feb 14, 2024
31348df
Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)
pcmoritz Feb 15, 2024
d7afab6
[BugFix] Fix GC bug for `LLM` class (#2882)
WoosukKwon Feb 15, 2024
4f2ad11
Fix DeciLM (#2883)
pcmoritz Feb 15, 2024
5255d99
[ROCm] Dockerfile fix for flash-attention build (#2885)
hongxiayang Feb 15, 2024
64da65b
Prefix Caching- fix t4 triton error (#2517)
caoshiyi Feb 16, 2024
5f08050
Bump up to v0.3.1 (#2887)
WoosukKwon Feb 16, 2024
185b2c2
Defensively copy `sampling_params` (#2881)
njhill Feb 17, 2024
8f36444
multi-LoRA as extra models in OpenAI server (#2775)
jvmncs Feb 17, 2024
786b7f1
Add code-revision config argument for Hugging Face Hub (#2892)
mbm-ai Feb 18, 2024
537c975
[Minor] Small fix to make distributed init logic in worker looks clea…
zhuohan123 Feb 18, 2024
a61f052
[Test] Add basic correctness test (#2908)
zhuohan123 Feb 19, 2024
ab3a5a8
Support OLMo models. (#2832)
Isotr0py Feb 19, 2024
86fd8bb
Add warning to prevent changes to benchmark api server (#2858)
simon-mo Feb 19, 2024
e433c11
Fix `vllm:prompt_tokens_total` metric calculation (#2869)
ronensc Feb 19, 2024
264017a
[ROCm] include gfx908 as supported (#2792)
jamestwhedbee Feb 20, 2024
63e2a64
[FIX] Fix beam search test (#2930)
zhuohan123 Feb 20, 2024
181b27d
Make vLLM logging formatting optional (#2877)
Yard1 Feb 20, 2024
017d9f1
Add metrics to RequestOutput (#2876)
Yard1 Feb 21, 2024
5253eda
Add Gemma model (#2964)
xiangxu-google Feb 21, 2024
c20ecb6
Upgrade transformers to v4.38.0 (#2965)
WoosukKwon Feb 21, 2024
a9c8212
[FIX] Add Gemma model to the doc (#2966)
zhuohan123 Feb 21, 2024
dc903e7
[ROCm] Upgrade transformers to v4.38.0 (#2967)
WoosukKwon Feb 21, 2024
7d2dcce
Support per-request seed (#2514)
njhill Feb 21, 2024
8fbd84b
Bump up version to v0.3.2 (#2968)
zhuohan123 Feb 21, 2024
7c4304b
Add sparsity support based with magic_wand GPU kernels
robertgshaw2-redhat Feb 1, 2024
5344a01
Update README.md
mgoin Feb 2, 2024
81dba47
Semi-structured 2:4 sparsity via SparseSemiStructuredTensor #4
afeldman-nm Feb 2, 2024
cf8eed7
Sparse fused gemm integration (#12)
LucasWilkinson Feb 14, 2024
7527b9c
Abf149/fix semi structured sparse (#16)
afeldman-nm Feb 16, 2024
3c11f56
Enable bfloat16 for sparse_w16a16 (#18)
mgoin Feb 16, 2024
8147811
seed workflow (#19)
andy-neuma Feb 16, 2024
e802bc2
Add bias support for sparse layers (#25)
mgoin Feb 16, 2024
b976653
Use naive decompress for SM<8.0 (#32)
mgoin Feb 21, 2024
78ba5c1
Varun/benchmark workflow (#28)
varun-sundar-rabindranath Feb 21, 2024
fbfd764
initial GHA workflows for "build test" and "remote push" (#27)
andy-neuma Feb 21, 2024
37883e0
Only import magic_wand if sparsity is enabled (#37)
mgoin Feb 21, 2024
acf16bf
manually reverted requirements to match v0.3.2
robertgshaw2-redhat Feb 22, 2024
dbf3cab
Merge branch 'main' into rs/bump-main-to-v0.3.2
robertgshaw2-redhat Feb 22, 2024
0feedf9
reverted requirements
robertgshaw2-redhat Feb 22, 2024
ce8164d
removed duplicate
robertgshaw2-redhat Feb 22, 2024
166c13b
format
robertgshaw2-redhat Feb 22, 2024
1b395b4
added noqa to upstream scripts for linter
robertgshaw2-redhat Feb 22, 2024
8d935be
format
robertgshaw2-redhat Feb 22, 2024
acb8615
Sparsity fix (#40)
robertgshaw2-redhat Feb 22, 2024
4b44479
Rs/marlin downstream v0.3.2 (#43)
robertgshaw2-redhat Feb 22, 2024
9209f15
additional updates to "bump-to-v0.3.2" (#39)
andy-neuma Feb 23, 2024
b1e14c2
move to 4 x gpu
Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove Yi model definition, please use LlamaForCausalLM instead (vl…
…lm-project#2854)

Co-authored-by: Roy <jasonailu87@gmail.com>
  • Loading branch information
pcmoritz and esmeetu authored Feb 13, 2024
commit 317b29de0f16428610e2e4d6a6953bee5a2d0ec2
7 changes: 2 additions & 5 deletions docs/source/models/supported_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ Alongside each architecture, we include some popular models that use it.
- InternLM2
- :code:`internlm/internlm2-7b`, :code:`internlm/internlm2-chat-7b`, etc.
* - :code:`LlamaForCausalLM`
- LLaMA, LLaMA-2, Vicuna, Alpaca, Koala, Guanaco
- :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`young-geng/koala`, etc.
- LLaMA, LLaMA-2, Vicuna, Alpaca, Yi
- :code:`meta-llama/Llama-2-13b-hf`, :code:`meta-llama/Llama-2-70b-hf`, :code:`openlm-research/open_llama_13b`, :code:`lmsys/vicuna-13b-v1.3`, :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.
* - :code:`MistralForCausalLM`
- Mistral, Mistral-Instruct
- :code:`mistralai/Mistral-7B-v0.1`, :code:`mistralai/Mistral-7B-Instruct-v0.1`, etc.
Expand All @@ -77,9 +77,6 @@ Alongside each architecture, we include some popular models that use it.
* - :code:`StableLMEpochForCausalLM`
- StableLM
- :code:`stabilityai/stablelm-3b-4e1t/` , :code:`stabilityai/stablelm-base-alpha-7b-v2`, etc.
* - :code:`YiForCausalLM`
- Yi
- :code:`01-ai/Yi-6B`, :code:`01-ai/Yi-34B`, etc.

If your model uses one of the above model architectures, you can seamlessly run your model with vLLM.
Otherwise, please refer to :ref:`Adding a New Model <adding_a_new_model>` for instructions on how to implement support for your model.
Expand Down
330 changes: 0 additions & 330 deletions vllm/model_executor/models/yi.py

This file was deleted.

1 change: 0 additions & 1 deletion vllm/transformers_utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
"qwen": QWenConfig,
"RefinedWeb": RWConfig, # For tiiuae/falcon-40b(-instruct)
"RefinedWebModel": RWConfig, # For tiiuae/falcon-7b(-instruct)
"yi": YiConfig,
}


Expand Down
2 changes: 0 additions & 2 deletions vllm/transformers_utils/configs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
# tiiuae/falcon-7b(-instruct) models. Newer Falcon models will use the
# `FalconConfig` class from the official HuggingFace transformers library.
from vllm.transformers_utils.configs.falcon import RWConfig
from vllm.transformers_utils.configs.yi import YiConfig

__all__ = [
"AquilaConfig",
Expand All @@ -16,5 +15,4 @@
"MPTConfig",
"QWenConfig",
"RWConfig",
"YiConfig",
]
Loading