Support YaRN models (RoFormer implementation in rotary_embedding kernel)

The YaRN model with a context size of 64k and 128k was recently released and pre-trained by people from Nous Research and EleutherAI. It uses the RoFormer type of embeddings that seem different from GPT-NeoX and GPT-J. It is based on the LLaMa 2 model, so support is mostly there, just need some small adjustments.

The original [YaRN module](https://huggingface.co/conceptofmind/Yarn-Llama-2-13b-128k/blob/main/modeling_llama_together_yarn.py#L116) uses the [flash attention rotary embedding](https://github.com/Dao-AILab/flash-attention/blob/a1576ad1e887c11f4b76f42e9dfaceeb6369cdb8/csrc/rotary/rotary_cuda.cu#L9) implementation and seems similar in functionality. You may also be interested in the original RoFormer implementation from [Huggingface](https://github.com/huggingface/transformers/blob/df04959e5542d41b269f96305d82c62287350cee/src/transformers/models/roformer/modeling_roformer.py#L319).

Models catalog:
https://huggingface.co/NousResearch/Yarn-Llama-2-7b-64k
https://huggingface.co/NousResearch/Yarn-Llama-2-7b-128k
https://huggingface.co/NousResearch/Yarn-Llama-2-13b-64k
https://huggingface.co/NousResearch/Yarn-Llama-2-13b-128k

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support YaRN models (RoFormer implementation in rotary_embedding kernel) #980

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support YaRN models (RoFormer implementation in rotary_embedding kernel) #980

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions