Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE #8973

Merged
merged 59 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
0abac6f
Enable 8-bit weights in Fused Marlin MoE
ElizaWszola Aug 30, 2024
fdf69c2
fix rocm
ElizaWszola Aug 30, 2024
4da163b
bad paste
ElizaWszola Aug 30, 2024
21d2337
add test case; fix imports for tests
dsikka Aug 30, 2024
080ab23
Merge branch 'main' into marlin-moe-8-bit
dsikka Aug 30, 2024
638777a
fix to adapt custom_routin_function
dsikka Aug 30, 2024
bd4b84d
Use select_experts to compute top_k tensors in fused moe
ElizaWszola Sep 2, 2024
bef6b53
bring back fused_moe_marlin -> fused_marlin_moe
ElizaWszola Sep 3, 2024
befc52b
Merge branch 'main' into marlin-moe-8-bit
ElizaWszola Sep 4, 2024
b45594c
remove large model
dsikka Sep 4, 2024
effd2cd
Cleanup, comments
ElizaWszola Sep 4, 2024
52c3353
fix moe init
ElizaWszola Sep 4, 2024
882fd9c
move larger models to an options larger test
dsikka Sep 4, 2024
973d914
add optional flag
dsikka Sep 4, 2024
72bc899
swap gpu
dsikka Sep 5, 2024
eea2bc3
Temp disable part of moe tests to see what's breaking
ElizaWszola Sep 5, 2024
9c29dc2
Fixes to act_order, make unit tests more robust
ElizaWszola Sep 5, 2024
6d04dcd
try to narrow down cuda error
ElizaWszola Sep 5, 2024
83e7999
Try different subset of test params
ElizaWszola Sep 6, 2024
6a42eaf
.
ElizaWszola Sep 6, 2024
3288842
.
ElizaWszola Sep 6, 2024
61ef4ba
Merge branch 'main' into marlin-moe-8-bit
ElizaWszola Sep 10, 2024
667d23e
fix and cleanup after merge
ElizaWszola Sep 10, 2024
b16838e
cleanup
ElizaWszola Sep 10, 2024
e53abb9
validate cache for the kernel code
ElizaWszola Sep 10, 2024
2f82715
cleanup commented out code
ElizaWszola Sep 11, 2024
2cc7dcc
Zero point in fused Marlin MoE kernel
ElizaWszola Sep 12, 2024
50cc766
Split into multiple files for faster compilation (work in progress)
ElizaWszola Sep 13, 2024
4b11a7d
it compiles
ElizaWszola Sep 16, 2024
507af0c
try to compile the kernel code
ElizaWszola Sep 17, 2024
1b76e45
Compilation works
ElizaWszola Sep 19, 2024
0c7cbb5
Cleanup
ElizaWszola Sep 20, 2024
8a8f925
function name
ElizaWszola Sep 20, 2024
936f2b9
Move kernel files to a separate directory
ElizaWszola Sep 20, 2024
98ec9b6
Unit tests
ElizaWszola Sep 20, 2024
fa23e51
working kernel
ElizaWszola Sep 24, 2024
ae2afaf
Merge branch 'main' into marlin-moe-zero-points
ElizaWszola Sep 25, 2024
e98bc45
clean up unit tests, disable single awq test
ElizaWszola Sep 25, 2024
2f09e58
make has_zero_point boolean explicitly passed to fused_marlin_moe
ElizaWszola Sep 25, 2024
000796a
add awq moe
dsikka Sep 26, 2024
e8289ae
update
dsikka Sep 26, 2024
0385aa8
update awq
dsikka Sep 27, 2024
6c4eca2
Merge branch 'main' into marlin-moe-zero-points
ElizaWszola Sep 30, 2024
091a4bb
Post-merge fix, remove e=4 case from unit tests to speed them up a bit
ElizaWszola Sep 30, 2024
3d12554
move to marlin; clean-up
dsikka Sep 30, 2024
b54b633
fix typo; add test
dsikka Sep 30, 2024
e0e5a74
Michael's feedback, cleanup
ElizaWszola Oct 1, 2024
793b065
Merge branch 'marlin-moe-zero-points' into awq_moe
dsikka Oct 1, 2024
bbf575e
use replace_parameters; clean-up
dsikka Oct 1, 2024
79126f9
more clean-up
dsikka Oct 1, 2024
3ff0ba1
Merge pull request #13 from neuralmagic/awq_moe
ElizaWszola Oct 1, 2024
87d46dc
Delete 8-bit zero point code
ElizaWszola Oct 1, 2024
8fe6da4
fix file reverted from some commit hoopla
dsikka Oct 1, 2024
a966417
Make workspace smaller, add very small thread config
ElizaWszola Oct 2, 2024
fa4d269
try to make required cache smaller
ElizaWszola Oct 2, 2024
91924c1
Merge branch 'marlin-moe-zero-points' of https://github.com/neuralmag…
ElizaWszola Oct 2, 2024
fb8a1e7
revert
ElizaWszola Oct 2, 2024
df0d691
Merge branch 'main' into marlin-moe-zero-points
ElizaWszola Oct 4, 2024
5daa141
Merge branch 'main' into marlin-moe-zero-points
ElizaWszola Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,8 @@ if(VLLM_GPU_LANG STREQUAL "CUDA")
"csrc/moe/marlin_kernels/marlin_moe_kernel_ku4b8.cu"
"csrc/moe/marlin_kernels/marlin_moe_kernel_ku8b128.h"
"csrc/moe/marlin_kernels/marlin_moe_kernel_ku8b128.cu"
"csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.h"
"csrc/moe/marlin_kernels/marlin_moe_kernel_ku4.cu"
"csrc/moe/marlin_moe_ops.cu")

set_gencode_flags_for_srcs(
Expand Down
Loading
Loading