Closed
Description
Hi, there are some very sucessfull experiements shows that NTK based RoPE can obtain a good extrapolate ability without even finetune.
I have test as well, it works well, an 1024 trained model can have a very impressive long context ability with NTK RoPE.
Would consider support it as it doesn't requires many changes (maybe)?
However, the pos op implement baked in cu op kernel.
Currently I can using torch code to judge if context length bigger than 2048 then applying NTK, but isn't would be better if vllm can support it out of box?