Currently only `transformers` runtime can handle the fp16 versions, but vLLM has an open PR to support rope-scaling: https://github.com/vllm-project/vllm/pull/555 Since we run everything else with vLLM it would be good to do apples-to-apples.