This repository was archived by the owner on Jun 3, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 191
[Feature Branch] KV Cache Interface #1083
Merged
Merged
Changes from all commits
Commits
Show all changes
109 commits
Select commit
Hold shift + click to select a range
48ac0ac
initial commit
dbogunowicz cf7f2b9
Update src/deepsparse/license.py
dbogunowicz 832630a
Merge branch 'main' into feature/damian/do_not_save_to_tmp
dbogunowicz 9958c83
Merge branch 'main' into feature/damian/do_not_save_to_tmp
dbogunowicz e6d2b03
limit to 150mb
dbogunowicz 7f9935b
ready to review
dbogunowicz b1cf01b
initial commit
dbogunowicz 0a3f48d
[Codegen][ORT][Static Seq Length] TextGenerationPipeline (#946)
dbogunowicz add4625
[CodeGen][Documentation] (#956)
dbogunowicz 22d2746
reimplementation for generative pipelines
markurtz 7f1651d
restore text generation from examples
dbogunowicz b85746d
[CodeGen] ONNX model loading to support >2Gb models / two engines (#991)
dbogunowicz aadc608
refactor sucessfull
dbogunowicz 58bc2b0
Pipeline fully refactored, time to test engine support. Note: Sliding…
dbogunowicz d538444
First iteration with Sage
dbogunowicz e19676b
Apply suggestions from code review
dbogunowicz 7908b74
ORT agrees with the Engine. But they both give not entirely correct r…
dbogunowicz 4bc3472
dynamic ORT vs static DS
dbogunowicz c07f7ed
pipeline handles OPT multitoken pass
dbogunowicz fb77838
fixes to get static pipeline a little further along
2097463
adjust shapes and slicing to enable static autoregressive pass - ISSU…
5eb10a9
migrate from cache_length to positions input
9213f29
got if working for multitoken + single token scenario
dbogunowicz d9af004
cleanup the pipeline
dbogunowicz 476f25d
further cleanup post merge
dbogunowicz fab44e4
Pipeline working for single-token inference only
dbogunowicz d454e2f
do not load the onnx model with external files twice
dbogunowicz 1613e25
pipeline never redundantly saves the external data + more robust toke…
dbogunowicz b61055c
Stop saving tmp files, otherwise the engine looks for external files …
dbogunowicz 6ee25fc
Left pad support
5d3004b
cleanup
dbogunowicz ace6fa5
cleanup2
dbogunowicz 388586d
Add in pipeline timing
markurtz afd0139
add in force tokens logic
markurtz 30eeda7
remove input validation for text generation pipelines
markurtz 5882b56
remove multitoken support for now
markurtz 4bbe33d
remove kv cache engine and other fixes
markurtz afa5746
nest input shape override
markurtz e2bb78c
comment out input shape override
markurtz 2299009
add non batch override for ORT
markurtz 2935b77
clean up generation pipeline
markurtz b89b156
Merge branch 'main' into feature/damian/do_not_save_to_tmp
dbogunowicz dc3d61b
initial commit
dbogunowicz a294265
Update src/deepsparse/license.py
dbogunowicz af97f2b
limit to 150mb
dbogunowicz c117788
ready to review
dbogunowicz 4ad5f49
fix the erronous Makefile
dbogunowicz 9e816bb
Merge branch 'feature/damian/do_not_save_to_tmp' of https://github.co…
dbogunowicz f97467f
perhaps fixed GHA
dbogunowicz 6be8d87
take into consideration that GHA creates four files
dbogunowicz e2f088d
initial commit
dbogunowicz 9fc6c64
Merge remote-tracking branch 'origin/feature/damian/do_not_save_to_tm…
dbogunowicz a610faf
tested with actual model
dbogunowicz 347d1fb
remove val_inp argument
dbogunowicz e11027c
Update README.md
dbogunowicz a950910
Apply suggestions from code review
dbogunowicz c1d02dc
Update README.md
dbogunowicz 711cdfb
Merge branch 'main' into feature/damian/codegen_pipeline_clean
dbogunowicz e602662
Merge branch 'main' into feature/damian/codegen_pipeline_clean
dbogunowicz 2085c37
[BugFix] Update deepsparse dockerfile (#1069)
rahul-tuli 2f7bc95
initial implementation
dbogunowicz e18fab7
working implementation for pipeline input
dbogunowicz 0358d87
[Fix] Fix CLI benchmark errors (#1071)
dbogunowicz 06b5246
Merge branch 'main' into feature/damian/codegen_pipeline_clean
dbogunowicz 2cab681
Merge branch 'feature/damian/codegen_pipeline_clean' into feature/dam…
dbogunowicz 63b116b
Clean a typo in the pipeline code
dbogunowicz cde08b9
initial commit
dbogunowicz 99d125c
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz 67ffe47
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz 9937686
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz 0d6a423
[KV Cache Interface] DecoderKVCache (#1084)
dbogunowicz 0809aea
[WiP] [KV Cache Interface] Text Generation & Decoder Engine Implement…
dbogunowicz 7001a6e
working implementation, time to cleanup
dbogunowicz c1bf5b7
now kv cache decoder holds information about the num of tokens prepro…
dbogunowicz 79251e6
cleanup the old files
dbogunowicz 9efbdb6
Update src/deepsparse/transformers/engines/nl_decoder_engine.py
dbogunowicz da5e93e
ready for review
dbogunowicz a680dac
ready for testing
dbogunowicz 7099994
managed to get first logits right
dbogunowicz 1d4d96d
Delete example
dbogunowicz 08e5421
cleanup before sharing with Ben and Sage
dbogunowicz bfaa072
Merge branch 'feature/damian/pipeline_engine_support' of https://gith…
dbogunowicz fbeeb4a
Update src/deepsparse/transformers/engines/nl_decoder_engine.py
dbogunowicz f83dcab
assert proper padding on pipeline init
dbogunowicz e659c33
now also supporting kv cache perplexity. time for cleanup
dbogunowicz cf74ad7
ready for review
dbogunowicz 853f876
correctly print engine info
dbogunowicz e8da07e
work with left padding of the tokenizer
dbogunowicz 58b12c8
quality
dbogunowicz eecd232
fix the multitoken inference
dbogunowicz 10c804a
Perplexity Eval for Text Generation Models (#1073)
dbogunowicz 7bd23d6
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz 10ba82e
[Text Generation] Run deepsparse engine without the LIB.kv_cache obje…
dbogunowicz e81c327
added few improvements that turned out to be useful post manual testing
dbogunowicz b737f77
Update src/deepsparse/transformers/engines/nl_decoder_engine.py
dbogunowicz 042cb79
fixed the logic to assert correct multibatch inference
dbogunowicz bf4eac3
Merge branch 'feature/damian/fb_kv_cache' of https://github.com/neura…
dbogunowicz c8a1f93
fix integration tests
dbogunowicz d2d3dc1
initial implementation
dbogunowicz 6ce1ca4
perplexity working, so as batched inference for different sized inputs
dbogunowicz 47dc986
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz ef77d91
fix the integration test
dbogunowicz f0d74b0
Merge branch 'feature/damian/fb_kv_cache' of https://github.com/neura…
dbogunowicz 186c80c
better solution for fixing the issues caused by this PR in GHA
dbogunowicz 09993e7
revert changes to yolo pipeline
dbogunowicz ba8c126
Merge branch 'main' into feature/damian/fb_kv_cache
dbogunowicz 37e8a02
Update src/deepsparse/transformers/engines/nl_decoder_engine.py
dbogunowicz 0d308b9
response to Rahuls comments
dbogunowicz 41e9306
Merge remote-tracking branch 'origin/main' into feature/damian/fb_kv_…
dbogunowicz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # flake8: noqa | ||
| from .nl_decoder_engine import * |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.