-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Padding-Free Plugin to FMS-Acceleration #57
Introduce Padding-Free Plugin to FMS-Acceleration #57
Conversation
3f08e09
to
decc009
Compare
plugins/instruct-lab/src/fms_acceleration_ilab/framework_plugin_padding_free.py
Outdated
Show resolved
Hide resolved
plugins/instruct-lab/src/fms_acceleration_ilab/framework_plugin_padding_free.py
Outdated
Show resolved
Hide resolved
Make sure go through this checklist https://github.com/foundation-model-stack/fms-acceleration/tree/main/plugins/framework#adding-new-plugins For benches maybe we can think about how to make a seperate set from the current set. Since this is completely seperate from other plugins, so that we do not have to rerun all the benches everytime. This will require some changes to the benchmarking. Maybe one simple solution is to just have a difference |
71321a1
to
3238801
Compare
plugins/instruct-lab/src/fms_acceleration_ilab/framework_plugin_padding_free.py
Outdated
Show resolved
Hide resolved
915ba17
to
bff3128
Compare
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000960000 user <aaron.chew1@ibm.com>
66f9cc2
to
c9e355a
Compare
* edits to readme Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> * Apply suggestions from code review Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com> Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> * more readme changes Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> --------- Signed-off-by: 1000960000 user <aaron.chew1@ibm.com> Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>
Description
This PR introduces support for a new padding-free plugin to FMS-Acceleration, this will allow for users to speed up their finetuning by performing attention computation without padding. This can be activated through the
sft_trainer
cli by passing plugin argumentpadding_free
- e.g.--padding_free huggingface
Currently uses a fork of fms-hf-tuning to
sft_trainer
argumentNote
Test
The following comparison is between a padded example and a padding free example.
We observe a 27% increase in runtime efficiency through the padding-free plugin, processing the same number of tokens
The improvement is dataset dependent as we see different performance improvements across datasets (see reference PR) possibly due to varying sequence length distributions from each dataset (longer sequences will lead to larger throughputs and more improvement).
Note:
The throughput results from SFTTrainer metrics will include the padding tokens if
padding=True
(see here). Instead we use train-runtime to compare.Alpaca
Reproduce
Padded Experiment
Result
Padding-Free Experiment
Reproduce
Result