Skip to content

Opt #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Aug 1, 2025
Merged

Opt #40

merged 18 commits into from
Aug 1, 2025

Conversation

kaselby
Copy link
Collaborator

@kaselby kaselby commented Jun 24, 2025

Description

Adds support for OPT. As this model is significantly different than the others, this requires a new cpp kernel and weight cache. I'd appreciate if this could be looked over by someone with more experience in this area than me, as I am not sure I implemented this in the most elegant way possible.

@vkkhare
Copy link
Contributor

vkkhare commented Jun 28, 2025

Why do we need different kernels here? Is it because we want relu activation instead of Silu?

@vkkhare
Copy link
Contributor

vkkhare commented Jul 1, 2025

Shall we go ahead for merging this PR?

@kaselby
Copy link
Collaborator Author

kaselby commented Jul 1, 2025

Did you want me to try to refactor the cpp code first? I know you mentioned it might be better to keep it all in one file/function with flags to distinguish them

@vkkhare
Copy link
Contributor

vkkhare commented Jul 1, 2025

Sure lets do that and merge there after

kaselby added 18 commits July 30, 2025 13:23
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…rsity thresholds

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…emove predictor loss and type hints).

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
…el with sparse predictors disabled in order for activation capture to work properly.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
@vkkhare vkkhare merged commit fb12264 into NimbleEdge:main Aug 1, 2025
1 check passed
kaselby added a commit to kaselby/sparse_transformers that referenced this pull request Aug 1, 2025
* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add topk and statistical topk sparsity methods as well as initial sparsity thresholds

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config sparsities

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Initial commit for opt.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Initial commit for opt.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixing bugs

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Working version of OPT code.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix small syntax error and update OPT code to match new formatting (remove predictor loss and type hints).

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Replace separate OPT kernels with flags for base sparse kernels.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Replace separate OPT kernels with flags for base sparse kernels.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Updating opt code to work with current codebase

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Small fixes and rework to generate_dataset to use the sparse base model with sparse predictors disabled in order for activation capture to work properly.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

---------

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
kaselby added a commit that referenced this pull request Aug 11, 2025
* Opt (#40)

* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add topk and statistical topk sparsity methods as well as initial sparsity thresholds

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config sparsities

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Initial commit for opt.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Initial commit for opt.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixing bugs

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Working version of OPT code.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix small syntax error and update OPT code to match new formatting (remove predictor loss and type hints).

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Replace separate OPT kernels with flags for base sparse kernels.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Replace separate OPT kernels with flags for base sparse kernels.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Updating opt code to work with current codebase

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Small fixes and rework to generate_dataset to use the sparse base model with sparse predictors disabled in order for activation capture to work properly.

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

---------

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* KV Cache and Topk sparsity (#61)

* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add KV cache to benchmark.py

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add topk and statistical topk sparsity methods as well as initial sparsity thresholds

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture for generate dataset

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config sparsities

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix config

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix activation capture to be capturing hidden states at the start of layer instead of start of MLP block

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Added documentation to measure_gt_sparsity to indicate how it can be used to calculate sparsity thresholds for topk

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* add sparsity method parameter to downstream eval

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

---------

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <30674826+kaselby@users.noreply.github.com>
Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* updated forward pass

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Add flag to disable weight cache and compute sparsity without union over batch dimension

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Set default value of use_weight_cache to true if not found in config

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* removing unnecessary cpp kernels

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* remove references to sparse_mlp_forward and fix opt skip mlp

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Attempt to fix whatever happened with the previous unsigned commit

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fixes

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* remove break after eos token in benchmark to ensure consistent benchmarking

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Minor fixes to configs and modelling_opt, as well as fixes t ensure cuda is properly being utilized

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix minor issues with evaluation script arguments

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Fix minor issues with evaluation script arguments

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

* Merge updates to activation capture

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>

---------

Signed-off-by: Kira Selby <kaselby@uwaterloo.ca>
Signed-off-by: Kira Selby <30674826+kaselby@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants