Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel #7766

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c2a48a8
clean-up model weight loading; support channel wise weight scales
dsikka Aug 14, 2024
47c4bd2
update fp8 MoE to use updated weight loading
dsikka Aug 14, 2024
d3f63dd
add group quant weight load support
dsikka Aug 14, 2024
1311ca7
moe kernel (#403)
dsikka Aug 15, 2024
99f0630
clean-up
dsikka Aug 15, 2024
905cda4
more clean-up
dsikka Aug 16, 2024
defd71f
update condition for arch
dsikka Aug 16, 2024
52a08f0
update deepseek and qwen to skip weight_shape loading
dsikka Aug 16, 2024
f408a26
comments
dsikka Aug 16, 2024
9c855af
format
dsikka Aug 18, 2024
09881ac
fix issue on upstream-main
dsikka Aug 19, 2024
7b7b36a
fix imports for tests
dsikka Aug 19, 2024
40c8dc0
PR comments + fix import issue
dsikka Aug 19, 2024
4df0b52
format
dsikka Aug 19, 2024
1465a2b
Michael's feedback
ElizaWszola Aug 20, 2024
9409f67
testing
dsikka Aug 21, 2024
162e579
Assert no fp8 in fused moe
ElizaWszola Aug 21, 2024
e5a6724
Don't compile marlin_moe_ops on ROCm
ElizaWszola Aug 23, 2024
05f37be
clean-up model weight loading; support channel wise weight scales
dsikka Aug 14, 2024
eff08ba
update fp8 MoE to use updated weight loading
dsikka Aug 14, 2024
2c310c0
add group quant weight load support
dsikka Aug 14, 2024
e0506f3
moe kernel (#403)
dsikka Aug 15, 2024
b58b50f
clean-up
dsikka Aug 15, 2024
760ce4b
more clean-up
dsikka Aug 16, 2024
27ebda7
update deepseek and qwen to skip weight_shape loading
dsikka Aug 16, 2024
7ae3f1a
comments
dsikka Aug 16, 2024
172928d
format
dsikka Aug 18, 2024
d94074d
fix imports for tests
dsikka Aug 19, 2024
0f5f268
PR comments + fix import issue
dsikka Aug 19, 2024
5d2a131
format
dsikka Aug 19, 2024
a5420e4
move to weight_loading test to deal with CI cuda memory issues
dsikka Aug 21, 2024
cfbc594
Disable fused Mixtral on AMD
ElizaWszola Aug 23, 2024
e31a76c
format
ElizaWszola Aug 23, 2024
3b2fcff
Bring back missing test
ElizaWszola Aug 23, 2024
24a3b57
Remove fallback
ElizaWszola Aug 26, 2024
58ca798
remove redundant test
dsikka Aug 26, 2024
eb72c6a
rebase fix
dsikka Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update deepseek and qwen to skip weight_shape loading
  • Loading branch information
dsikka committed Aug 27, 2024
commit 52a08f0c762593ff672fe7f327432d7ace64a5f6
2 changes: 2 additions & 0 deletions vllm/model_executor/models/deepseek_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,8 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):

params_dict = dict(self.named_parameters())
for name, loaded_weight in weights:
if "weight_shape" in name:
continue
if "rotary_emb.inv_freq" in name:
continue
for (param_name, weight_name, shard_id) in stacked_params_mapping:
Expand Down
2 changes: 2 additions & 0 deletions vllm/model_executor/models/qwen2_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,8 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):

params_dict = dict(self.named_parameters())
for name, loaded_weight in weights:
if "weight_shape" in name:
continue
if "rotary_emb.inv_freq" in name:
continue
for (param_name, weight_name, shard_id) in stacked_params_mapping:
Expand Down