rope scaling doesn't work

I'm trying to use rope scaling to increase the `max_seq_len`. I refer to #555 and modify the model's config.json to add the key `rope_scaling`:

``` json
{
  "_name_or_path": "m42-health/med42-70b",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 8192,
  "initializer_range": 0.02,
  "intermediate_size": 28672,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 64,
  "num_hidden_layers": 80,
  "num_key_value_heads": 8,
  "pad_token_id": 0,
  "rms_norm_eps": 1e-05,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.28.1",
  "use_cache": true,
  "vocab_size": 32000,
  "rope_scaling": {
    "factor": 2.0,
    "type": "dynamic"
  }
}
```

And I initiated vLLM engine by 

```python
cache_dir = "/secure/hf_cache"
model_name_or_path = "m42-health/med42-70b"
llm = LLM(model=model_name_or_path, download_dir=cache_dir, tensor_parallel_size=4, dtype="auto")
```

However, when I performed inference on long prompts, I still got the warning:
```
WARNING 01-20 16:48:15 scheduler.py:149] Input prompt (2380 tokens) is too long and exceeds limit of 2048
```

Does anyone have this issue before?

p.s., version of my vllm is 0.2.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rope scaling doesn't work #2518

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

rope scaling doesn't work #2518

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions