Add repkv_backward_kernel2 and repkv_kernel2 (llama3 branch)#771
Open
insop wants to merge 6 commits intokarpathy:llama3from
Open
Add repkv_backward_kernel2 and repkv_kernel2 (llama3 branch)#771insop wants to merge 6 commits intokarpathy:llama3from
repkv_backward_kernel2 and repkv_kernel2 (llama3 branch)#771insop wants to merge 6 commits intokarpathy:llama3from
Conversation
repkv_backward_kernel2, reduced thread by using dinp as index instead of dout
Author
|
@karpathy , PTAL. |
repkv_backward_kernel2, reduced thread by using dinp as index instead of doutrepkv_backward_kernel2 and repkv_kernel2
…v_backward_kernel2
repkv_backward_kernel2 and repkv_kernel2repkv_backward_kernel2 and repkv_kernel2 (llama3 branch)
…v_backward_kernel2
Author
|
Please let me know if you have any feedback. |
…v_backward_kernel2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Add
repkv_backward_kernel2repkv_backward_kernel1by reducing thread used per @karpathy's suggestionAlso add
repkv_kernel2simiar tobackward_kernel2Here is the test output for
repkv_backward_kernel2Execution time is improved compared to
kernel1time shown below from previous PR (#764)Here is the test output for
repkv_kernel2Execution time is improved compared to
kernel1