Closed
Description
looks like there are various methods of extending context length, suck as superhot, ntk-aware, and condensed rope. This request is to track support for condensed rotaryembeddings as it seems to have the best performance at long contexts atm.
https://lmsys.org/blog/2023-06-29-longchat/
https://github.com/lm-sys/FastChat/blob/3f0c6e54498e179098ead9a596929e23327ad75c/fastchat/model/llama_condense_monkey_patch.py#L68