Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trying to debug aggregate step of moe mlp #1552

Merged
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
17f162f
with .
hugolatendresse Dec 2, 2024
f5c3f7f
all _ instead of .
hugolatendresse Dec 3, 2024
22e2a58
_block
hugolatendresse Dec 3, 2024
1cd424a
try nullptr
hugolatendresse Dec 3, 2024
df0c59b
no comma
hugolatendresse Dec 3, 2024
3598ee9
debug
hugolatendresse Dec 3, 2024
06259f7
debug
hugolatendresse Dec 3, 2024
c89a327
revert aggregate
hugolatendresse Dec 6, 2024
71c5d69
one expert
hugolatendresse Dec 7, 2024
cb1eaa8
sync
hugolatendresse Dec 7, 2024
286a1fe
dont redefien mlpout
hugolatendresse Dec 7, 2024
8709035
sync
hugolatendresse Dec 7, 2024
51f9701
sync
hugolatendresse Dec 7, 2024
a65e733
register tokenizer for mixtral
hugolatendresse Dec 7, 2024
62bf012
sync
hugolatendresse Dec 7, 2024
61adc0f
rename weights
hugolatendresse Dec 7, 2024
8c69b8b
sync
hugolatendresse Dec 7, 2024
0b89169
sync
hugolatendresse Dec 7, 2024
76cac36
permission
Dec 7, 2024
baa30a8
Merge branch 'nomlp2' of https://github.com/hugolatendresse/FlexFlow …
Dec 7, 2024
53a4cc4
sync
hugolatendresse Dec 7, 2024
bd1ffa0
Merge branch 'nomlp2' of github.com:hugolatendresse/FlexFlow into nomlp2
hugolatendresse Dec 7, 2024
aeb29e9
sync
hugolatendresse Dec 7, 2024
d27804f
sync
hugolatendresse Dec 7, 2024
ecb9675
sync
hugolatendresse Dec 7, 2024
af665bd
which loading
hugolatendresse Dec 7, 2024
16ab912
sync
hugolatendresse Dec 7, 2024
c3945e3
sync
hugolatendresse Dec 7, 2024
9385e82
.o
hugolatendresse Dec 7, 2024
ce91966
sync
hugolatendresse Dec 7, 2024
7e558bc
able to output with mixtral (!!!) but it's all etc etc etc
hugolatendresse Dec 7, 2024
c8007fd
try expert 1
hugolatendresse Dec 7, 2024
aa01156
revert experts
hugolatendresse Dec 7, 2024
539b491
tmp fix
hugolatendresse Dec 7, 2024
28b2df0
dummy gate
hugolatendresse Dec 7, 2024
8ad9478
bad softmax fix
hugolatendresse Dec 7, 2024
9dcb5c2
printf
hugolatendresse Dec 8, 2024
7ed7d65
dims
hugolatendresse Dec 8, 2024
a906c6a
sync
hugolatendresse Dec 8, 2024
0af8064
sync
hugolatendresse Dec 8, 2024
4d26fb5
comments on dims
hugolatendresse Dec 8, 2024
c0e4524
sync
hugolatendresse Dec 8, 2024
7462fb4
sync
hugolatendresse Dec 8, 2024
21ecf77
sync
hugolatendresse Dec 8, 2024
742ec59
sync
hugolatendresse Dec 8, 2024
b04af7a
sync
hugolatendresse Dec 8, 2024
99954e5
sync
hugolatendresse Dec 8, 2024
511fc25
sync
hugolatendresse Dec 8, 2024
e590ce5
sync
hugolatendresse Dec 8, 2024
1ed4bff
sync
hugolatendresse Dec 8, 2024
5700378
CHECKPOINT
hugolatendresse Dec 8, 2024
381d3cd
2222:22 port
hugolatendresse Dec 8, 2024
6309e70
tmp_volume
hugolatendresse Dec 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
sync
  • Loading branch information
hugolatendresse committed Dec 7, 2024
commit 62bf0121f0a6d6fa1799aa71a57567414b1e2846
12 changes: 6 additions & 6 deletions inference/models/mixtral.cc
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
mixtral_config.rms_norm_eps,
mixtral_config.hidden_size,
DT_NONE,
std::string("layers_" + std::to_string(i) + ".input_layernorm")
std::string("layers_" + std::to_string(i) + "_input_layernorm")
.c_str());
} else {
ff.residual_rms_norm(
Expand All @@ -86,7 +86,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
mixtral_config.hidden_size,
false, // inplace_residual
DT_NONE,
std::string("layers_" + std::to_string(i) + ".input_layernorm")
std::string("layers_" + std::to_string(i) + "_input_layernorm")
.c_str());
token = token_att_norm[0];
att_norm = token_att_norm[1];
Expand All @@ -104,7 +104,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
nullptr, // ?
REG_MODE_NONE, // no regularization
0.0f, // no dropout
std::string("layers_" + std::to_string(i) + ".self_attn.qkv_proj")
std::string("layers_" + std::to_string(i) + "_self_attn_qkv_proj")
.c_str());

Tensor mha;
Expand All @@ -126,7 +126,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
1.0f, /*scaling factor*/
true, /*qk_prod_scaling*/
false, /*position_bias*/
std::string("layers_" + std::to_string(i) + ".self_attn")
std::string("layers_" + std::to_string(i) + "_self_attn")
.c_str() /*name*/
);
break;
Expand All @@ -148,7 +148,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
nullptr,
REG_MODE_NONE,
0.0f,
std::string("layers_" + std::to_string(i) + ".self_attn.o_proj")
std::string("layers_" + std::to_string(i) + "_self_attn_o_proj")
.c_str());

// step 2: SILU activaion
Expand All @@ -161,7 +161,7 @@ void MIXTRAL::create_mixtral_model(FFModel &ff,
mixtral_config.hidden_size,
false, // inplace_residual
DT_NONE,
std::string("layers_" + std::to_string(i) + ".post_attention_layernorm")
std::string("layers_" + std::to_string(i) + "_post_attention_layernorm")
.c_str());
token = token_ff_norm[0];
Tensor ff_norm = token_ff_norm[1];
Expand Down