Adding attention_reshape (inference only) kernels. #497

Narsil · 2023-02-26T21:30:25Z

Tried to implement cat but there were some weird issues, and it was
causing a ton of kernels being called, with each some device to host sync (huge bottleneck).

This PR adds a pretty specific reshape for (qkv, past_key, past_value) useful for
inference run of transformer models. For instance gpt2 (https://github.com/Narsil/fast_gpt2)

Narsil · 2023-02-27T10:46:40Z

@coreylowman

coreylowman

Nice contribution, thanks! 🚀

Narsil added 3 commits February 26, 2023 22:27

Adding attention_reshape (inference only) kernels.

95fd336

Make kernel dtype agnostic.

c55d4e6

Simplify type.

a6887c0

coreylowman approved these changes Feb 28, 2023

View reviewed changes

coreylowman merged commit 1d8d20a into coreylowman:main Feb 28, 2023

Narsil deleted the attention_reshape branch February 28, 2023 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding attention_reshape (inference only) kernels. #497

Adding attention_reshape (inference only) kernels. #497

Narsil commented Feb 26, 2023

Narsil commented Feb 27, 2023

coreylowman left a comment

Adding attention_reshape (inference only) kernels. #497

Adding attention_reshape (inference only) kernels. #497

Conversation

Narsil commented Feb 26, 2023

Narsil commented Feb 27, 2023

coreylowman left a comment

Choose a reason for hiding this comment