-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Kernels for Full FT and Non-Quantized PEFT #79
Conversation
72012d5
to
e47d48a
Compare
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/models/granite.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get the bfloat16 numbers, lets compare with the float16 numbers to see if there are substantial changes.
Also, lets have in the FoAK readme, to documnet down future items for kernels that are missing for certain models
Model | norm | pos emb | cross-ent | fused_lora |
---|---|---|---|---|
LlamaForCausalLM |
✅ | ✅ | ✅ | ✅ |
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Outdated
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_quantized_peft.py
Outdated
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/fused-ops-and-kernels/src/fms_acceleration_foak/framework_plugin_fast_kernels.py
Outdated
Show resolved
Hide resolved
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Show resolved
Hide resolved
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com> Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
…f activating kernels Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>
be39ac9
to
369f738
Compare
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
369f738
to
03fc41c
Compare
@achew010 there are some benchmarks still not updated, but I will merge this first then we can address in later PR |
Description
This PR
FastKernelsAccelerationPlugin
that is an improved version overFastQuantizedPeftAccelerationPlugin
training
stanza or apeft.quantized
stanzaImprovements to Full Finetuning
7% Improvement from following kernels (FastCrossEntropyLoss, FastRMSNorm, FastRoPE)
Compatibility Matrix with Mixed Precision
Attempting to
unscale FP16 gradients.
See here
Regression Test for Loss, Memory, Throughput
Running our alpaca benchmarks for most experiments in bfloat16 (except GPTQ-LoRA in float16. See issue). We see no significant regression in performance.
Note an outlier in the comparison plots show an anomalous memory increase in a standard full-FT experiment on Mistral7B with no accelerations installed. Since it does not point to any issues with the code in this PR, it might be caused by some slight instability of the benchmarking run.
Bug Fix to Model Patcher
There is no significant change in performance of FOAK from the fix for the improper patching of FastCrossEntropyLoss, however there is a slight decrease in improvement observed (consistent with issue 70) compared to previous paddingfree+foak numbers.
FLAN (6000 samples) with PaddingFree
Before BugFix
With BugFix
Note:
Due to issues with FSDP-QLoRA in the latest transformers version (
4.45.0dev
) mentioned here, Granite with Fast Kernels will be addressed in a later PR instead.TODO
FastKernelsAccelerationPlugin
. Follow the pattern of building the fused-lora rule for abase_type
.