-
Notifications
You must be signed in to change notification settings - Fork 12
Opt #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Opt #40
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Why do we need different kernels here? Is it because we want relu activation instead of Silu? |
Shall we go ahead for merging this PR? |
Did you want me to try to refactor the cpp code first? I know you mentioned it might be better to keep it all in one file/function with flags to distinguish them |
Sure lets do that and merge there after |
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…rsity thresholds Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…emove predictor loss and type hints). Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…el with sparse predictors disabled in order for activation capture to work properly. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
vkkhare
approved these changes
Aug 1, 2025
kaselby
added a commit
to kaselby/sparse_transformers
that referenced
this pull request
Aug 1, 2025
* Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add topk and statistical topk sparsity methods as well as initial sparsity thresholds Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config sparsities Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Initial commit for opt. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Initial commit for opt. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixing bugs Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Working version of OPT code. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix small syntax error and update OPT code to match new formatting (remove predictor loss and type hints). Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Replace separate OPT kernels with flags for base sparse kernels. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Replace separate OPT kernels with flags for base sparse kernels. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Updating opt code to work with current codebase Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Small fixes and rework to generate_dataset to use the sparse base model with sparse predictors disabled in order for activation capture to work properly. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> --------- Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
kaselby
added a commit
that referenced
this pull request
Aug 11, 2025
* Opt (#40) * Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add topk and statistical topk sparsity methods as well as initial sparsity thresholds Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config sparsities Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Initial commit for opt. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Initial commit for opt. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixing bugs Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Working version of OPT code. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix small syntax error and update OPT code to match new formatting (remove predictor loss and type hints). Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Replace separate OPT kernels with flags for base sparse kernels. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Replace separate OPT kernels with flags for base sparse kernels. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Updating opt code to work with current codebase Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Small fixes and rework to generate_dataset to use the sparse base model with sparse predictors disabled in order for activation capture to work properly. Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> --------- Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * KV Cache and Topk sparsity (#61) * Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add KV cache to benchmark.py Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add topk and statistical topk sparsity methods as well as initial sparsity thresholds Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture for generate dataset Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config sparsities Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix config Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix activation capture to be capturing hidden states at the start of layer instead of start of MLP block Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Added documentation to measure_gt_sparsity to indicate how it can be used to calculate sparsity thresholds for topk Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * add sparsity method parameter to downstream eval Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> --------- Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> Signed-off-by: Kira Selby <30674826+kaselby@users.noreply.github.com> Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * updated forward pass Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Add flag to disable weight cache and compute sparsity without union over batch dimension Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Set default value of use_weight_cache to true if not found in config Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * removing unnecessary cpp kernels Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * remove references to sparse_mlp_forward and fix opt skip mlp Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Attempt to fix whatever happened with the previous unsigned commit Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fixes Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * remove break after eos token in benchmark to ensure consistent benchmarking Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Minor fixes to configs and modelling_opt, as well as fixes t ensure cuda is properly being utilized Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix minor issues with evaluation script arguments Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Fix minor issues with evaluation script arguments Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> * Merge updates to activation capture Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> --------- Signed-off-by: Kira Selby <kaselby@uwaterloo.ca> Signed-off-by: Kira Selby <30674826+kaselby@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds support for OPT. As this model is significantly different than the others, this requires a new cpp kernel and weight cache. I'd appreciate if this could be looked over by someone with more experience in this area than me, as I am not sure I implemented this in the most elegant way possible.