-
Notifications
You must be signed in to change notification settings - Fork 33
Support block-modular architecture #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
oleksost
wants to merge
158
commits into
main
Choose a base branch
from
modular_hybrids
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
158 commits
Select commit
Hold shift + click to select a range
5137757
stuff
jlamypoirier f0cb32a
Merge remote-tracking branch 'origin/main' into config_updates
jlamypoirier f26010e
Update pretrained config
jlamypoirier b930a39
stuff
jlamypoirier 918a7a8
Merge branch 'config_updates' into update_pretrained_config
jlamypoirier 8117c47
fixes
jlamypoirier 1c995d3
fix
jlamypoirier 3f90475
Merge branch 'main' into config_updates
jlamypoirier e389058
Merge branch 'config_updates' into update_pretrained_config
jlamypoirier 506fe92
fixes
jlamypoirier 971d3ef
fixes
jlamypoirier 6bf20cb
Tests wip
jlamypoirier c13fb19
misc
jlamypoirier a20fcec
tests
jlamypoirier 9af26a7
Merge branch 'main' into config_updates
jlamypoirier 9af372d
Tests, fixes, remove tuple format
jlamypoirier dded00a
fix
jlamypoirier 42d5ca4
Merge remote-tracking branch 'origin/main' into config_updates
jlamypoirier 986f9f3
fix
jlamypoirier 5abc087
Merge branch 'config_updates' into update_pretrained_config
jlamypoirier 8e3e795
fixes
jlamypoirier da6eb7b
fixes
jlamypoirier 67e08aa
Merge branch 'main' into config_updates
jlamypoirier a09e6f3
Merge branch 'config_updates' into update_pretrained_config
jlamypoirier baad705
fix
jlamypoirier b702837
Test, fixes
jlamypoirier a8684f8
Knowledge distillation, fix cross-entropy
jlamypoirier b781729
Fixes, distillation
jlamypoirier db6504b
fixes
jlamypoirier 7c2933a
Merge remote-tracking branch 'origin/main' into config_updates
jlamypoirier a017c11
Merge branch 'config_updates' into update_pretrained_config
jlamypoirier 368a6bf
Merge remote-tracking branch 'origin/main' into update_pretrained_config
jlamypoirier e0c82a0
Merge remote-tracking branch 'origin/main' into distillation
jlamypoirier 16a3dd7
Merge branch 'update_pretrained_config' into distillation
jlamypoirier cff9892
fixes
jlamypoirier 793ecde
Merge branch 'update_pretrained_config' into distillation
jlamypoirier b67006a
fixes
jlamypoirier 2014108
Add constraints
jlamypoirier 4fb78e4
Merge remote-tracking branch 'origin/main' into distillation
jlamypoirier fa3d556
Add constraints
jlamypoirier 6c2c887
Separate reference model preprocessing
jlamypoirier 67f9db6
fix
jlamypoirier 48141e5
Merge remote-tracking branch 'origin/main' into update_pretrained_config
jlamypoirier e6e5a32
Merge branch 'update_pretrained_config' into distillation
jlamypoirier 537deca
fix
jlamypoirier a590e8b
Merge branch 'distillation' into reference_model_preprocessing
jlamypoirier 3d5dc94
Merge commit '6ad0a96c9328234b907d01a82c4c52bd48752b2f' into update_pβ¦
jlamypoirier 2bb0c08
Merge branch 'update_pretrained_config' into distillation
jlamypoirier 067ba97
Merge remote-tracking branch 'origin/main' into distillation
jlamypoirier d2b3154
misc
jlamypoirier 2e63d29
Merge branch 'distillation' into reference_model_preprocessing
jlamypoirier 7133e4d
Merge remote-tracking branch 'origin/main' into reference_model_preprβ¦
jlamypoirier a0ba051
fixes
jlamypoirier 9ddfb69
add per-layer lr-scale
RaymondLi0 5e282cc
modeling mtp llamba
oleksost 87b3197
modeling apriel ssm
oleksost d3e1df2
Apriel to SSM
oleksost 082cf22
Apriel SSM conversion
oleksost 66fb0a2
Merge remote-tracking branch 'origin/main' into reference_model_preprβ¦
jlamypoirier 0d4d5c5
fix
jlamypoirier b5ffd26
Merge remote-tracking branch 'origin/reference_model_preprocessing' iβ¦
oleksost c43e535
wip
oleksost a1f44d4
conversion apriel ssm
oleksost fbec02d
config apriel
oleksost 75d6460
temp checkpoint conversion
oleksost 73a4252
block pattern for hybrid conversion
oleksost 5afc7dc
SSMBlockType
oleksost 8e9facf
wip
oleksost 77ad39f
add token-prediction loss coefficients
RaymondLi0 da9bf1a
eval apriel ssm
oleksost ac4a598
fix
jlamypoirier 0c0e7d9
adding check for missing `rope_type` (#246)
nitsanluke 97ba9d4
Loss masking for distillation
jlamypoirier 231d5d8
test, misc
jlamypoirier d7922af
Merge branch 'reference_model_preprocessing' into distillation_loss_mask
jlamypoirier 30a75b0
eval apriel ssm
oleksost a50bc2e
cleanup
oleksost f8af7be
Merge branch 'oleksiy/apriel-ssm' of https://github.com/ServiceNow/Faβ¦
oleksost 6532c5f
hybrid config
oleksost 2a5646b
Merge remote-tracking branch 'origin/distillation_loss_mask' into oleβ¦
oleksost 9a678df
sft distill
oleksost a7abe53
conversion
oleksost a68c0b7
conversion
oleksost 9cfef44
lr stage definition as string
oleksost 005e623
fixes
jlamypoirier cad951a
fix
jlamypoirier 40970ec
Merge branch 'reference_model_preprocessing' into distillation_loss_mask
jlamypoirier bce916d
loss maks
oleksost 9d95064
fix
jlamypoirier 2c96abb
Merge remote-tracking branch 'origin/main' into reference_model_preprβ¦
jlamypoirier 935c470
fix
jlamypoirier 9aff3b7
fix shuffled tokens
oleksost d82ddbf
Merge remote-tracking branch 'origin/main' into reference_model_preprβ¦
jlamypoirier 6949c49
Merge branch 'reference_model_preprocessing' into distillation_loss_mask
jlamypoirier 9c105e7
Merge remote-tracking branch 'origin/main' into distillation_loss_mask
jlamypoirier ae4d111
fixes
jlamypoirier deb7ce6
fixes
jlamypoirier eaba34f
innit like in mamba in llama
oleksost f8ca122
embeddings_lr_scale
oleksost 2db740b
fix
jlamypoirier 41d4da3
disable freezing
RaymondLi0 4160b1f
hybrid model loading and exporting
oleksost 30ad8b8
wip
oleksost ea55ae2
Merge branch 'main' into oleksiy/apriel-ssm
oleksost cd4edd5
Merge remote-tracking branch 'origin/distillation_loss_mask' into oleβ¦
oleksost 9c4f38f
layer-lr scale for mlp as well
RaymondLi0 1784dca
wip
oleksost 1e3cc28
nvm
oleksost 2dc945b
hybrid modeling
oleksost 4277e67
modeling
oleksost 6153c33
Merge branch 'main' into oleksiy/apriel-ssm
oleksost c71cb16
nvm
oleksost be04c19
output lr scale
oleksost 1311f5b
output_lr_scale
oleksost baf4011
nvm
oleksost 6cf26c5
eval
oleksost 901d1b6
rename
oleksost b5696fb
Merge remote-tracking branch 'origin/raymond/per_layer_lr_scale' intoβ¦
oleksost 616c540
per_layer_lr_scale
oleksost 9af5ee5
merged also prediction_loss_coefficient from #243
oleksost 1a7939b
added logging in mamba
oleksost 532d0d5
no norm layer freezing
oleksost 8349130
test
oleksost 023102c
test
oleksost 865da95
debug
oleksost 87c93d3
comment
oleksost da4977d
Merge remote-tracking branch 'origin/main' into oleksiy/apriel-ssm
oleksost a18b80f
debug
oleksost 40d5437
wip
oleksost 72ace3b
fix
RaymondLi0 121e906
test + comment
oleksost aa3bc0b
stuff
jlamypoirier 28d321e
stuff
jlamypoirier 1bbd7fb
stuff
jlamypoirier 3595949
Minimalistic dynamic configs
jlamypoirier 39b1a04
stuff
jlamypoirier 8a8fa77
fix
RaymondLi0 8e25990
add test with frozen weights
RaymondLi0 456a0c5
add description for tests
RaymondLi0 87efd45
15b model apriel hybrid
oleksost 95c7b53
Merge remote-tracking branch 'origin/main' into oleksiy/apriel-ssm
oleksost 326387d
Merge remote-tracking branch 'origin/raymond/fix-frozen-weight' into β¦
oleksost aafbfb5
nvm
oleksost c7fe8d7
nvm
oleksost 848ef04
Merge remote-tracking branch 'origin/main' into oleksiy/apriel-ssm
oleksost c285e8d
nvm
oleksost 26e4924
Merge remote-tracking branch 'origin/minimalistic_dynamic_classes' inβ¦
oleksost 3eaa240
modeling
oleksost 4781d15
wip
oleksost ac4bfa9
wip
oleksost 45008b5
wip
oleksost a378954
wip hybrid block architecture
oleksost 38fc529
wip
oleksost 852bb92
Merge remote-tracking branch 'origin/main' into modular_hybrids
oleksost e5534fd
wip
oleksost 6860c43
added lr scales per block
oleksost 7178407
weight sharing
oleksost 0553a4b
test
oleksost File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is handled in from_dict. The expected class is not the same as the type hint.