Allow passing hf config args with openai server #2547

Aakash-kaushik · 2024-01-22T09:50:45Z

Hi,

Is there a specific reason for why can't we allow passing of args from the openai server to the HF config class, there are very reasonable use cases where i would want to override the existing args in a config while running the model dynamically though the server.

reference line

simply allowing *args in the openai server that are passed to this while loading the model, i believe there are internal checks for failing if anything configured is wrong anyway.

supported documentation in the transformers library:

        >>> # Change some config attributes when loading a pretrained config.
        >>> config = AutoConfig.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
        >>> config.output_attentions
        True

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-01-23T20:04:32Z

I believe there's no fundamental reason to this. Contribution welcomed! I would say you can add this to ModelConfig class and pass it through EngineArgs.

KrishnaM251 · 2024-01-25T21:36:00Z

I will take a look at this

mrPsycox · 2024-02-07T17:34:34Z

Anyone has news about that? I want to use --dtype, but it doesn't work

Aakash-kaushik · 2024-02-07T18:54:17Z

@mrPsycox —dtype is supported in vllm, please take a look at the engine args on vllm docs

mrPsycox · 2024-02-08T09:57:34Z

Thanks @Aakash-kaushik , I found the issue. Passing --dtype need to be in first args of the command, not in the last.

This works for me:

 run: |
   conda activate vllm
   python -m vllm.entrypoints.openai.api_server \
     --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
     --dtype half \
     --host 0.0.0.0 --port 8080 \
     --model <model_ name>

timbmg · 2024-04-30T15:13:48Z

Just as a workaround, I am currently doing something like this:

import shutil
import os
from contextlib import contextmanager

@contextmanager
def swap_files(file1, file2):
    try:
        
        temp_file1 = file1 + '.temp'
        temp_file2 = file2 + '.temp'
            
        print("Renaming Files.")
        os.rename(file1, temp_file1)
        os.rename(file2, file1)
        os.rename(temp_file1, file2)
        
        yield
        
    finally:
        print("Restoring Files.")
        os.rename(file2, temp_file2)
        os.rename(file1, file2)
        os.rename(temp_file2, file1)

file1 = '/path/to/original/config.json'
file2 = '/path/to/modified/config.json'

with swap_files(file1, file2):
    llm = LLM(...)

K-Mistele · 2024-05-06T02:17:08Z

I would love to see this as well

KrishnaM251 · 2024-06-27T22:19:55Z

@Aakash-kaushik @mrPsycox @timbmg @K-Mistele

Please take a look at my PR and let me know if it serves your purpose.

As @DarkLight1337 noted in my PR (#5836) , what exactly do you want to accomplish using this feature that cannot otherwise be done via vLLM args? (If we don't have any situation that results in different vLLM output, what is the point of enabling this?)

Once you get back to me, I'll write a test that covers that case.

github-actions · 2024-10-30T02:03:14Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

K-Mistele · 2024-10-30T04:06:27Z

Hi guys, just bumping this in case it's still relevant. Maybe not so much passing hf config.json args at request-time, but being able to set them for the OpenAI compatible server without having to dig into the model's cache directory would be super useful.

Some examples of where this would be applicable include configuring Qwen models and llama models' RoPE scaling:

Processing Long Texts
The current config.json is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.

For supported frameworks, you could add the following to config.json to enable YaRN:

{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

For deployment, we recommend using vLLM. Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM. Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, potentially impacting performance on shorter texts. We advise adding the rope_scaling configuration only when processing long contexts is required.

Maybe this is already implemented somewhere else?

DarkLight1337 · 2024-10-30T04:09:02Z

I proposed a similar feature in #5205, still looking for someone to implement it.

simon-mo added the good first issue Good for newcomers label Jan 23, 2024 — with Linear

KrishnaM251 added a commit to KrishnaM251/vllm-fork that referenced this issue Jan 30, 2024

fix passing hf_config args: vllm-project#2547

b18f057

KrishnaM251 mentioned this issue Jan 30, 2024

fix passing hf_config args: #2547 #2670

Open

Aakash-kaushik mentioned this issue Feb 6, 2024

Feature request: Controlling config.json of HF's models #1542

Closed

hmellor added the feature request label Apr 4, 2024

KrishnaM251 linked a pull request Jun 25, 2024 that will close this issue

[Frontend][Core] passing hf_config args through openai server #5836

Open

github-actions bot added the stale label Oct 30, 2024

K-Mistele mentioned this issue Oct 30, 2024

[Feature]: Option to override HuggingFace's configurations #5205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow passing hf config args with openai server #2547

Allow passing hf config args with openai server #2547

Aakash-kaushik commented Jan 22, 2024 •

edited by simon-mo

Loading

simon-mo commented Jan 23, 2024

KrishnaM251 commented Jan 25, 2024

mrPsycox commented Feb 7, 2024

Aakash-kaushik commented Feb 7, 2024

mrPsycox commented Feb 8, 2024 •

edited

Loading

timbmg commented Apr 30, 2024

K-Mistele commented May 6, 2024

KrishnaM251 commented Jun 27, 2024

github-actions bot commented Oct 30, 2024

K-Mistele commented Oct 30, 2024

DarkLight1337 commented Oct 30, 2024 •

edited

Loading

Allow passing hf config args with openai server #2547

Allow passing hf config args with openai server #2547

Comments

Aakash-kaushik commented Jan 22, 2024 • edited by simon-mo Loading

simon-mo commented Jan 23, 2024

KrishnaM251 commented Jan 25, 2024

mrPsycox commented Feb 7, 2024

Aakash-kaushik commented Feb 7, 2024

mrPsycox commented Feb 8, 2024 • edited Loading

timbmg commented Apr 30, 2024

K-Mistele commented May 6, 2024

KrishnaM251 commented Jun 27, 2024

github-actions bot commented Oct 30, 2024

K-Mistele commented Oct 30, 2024

DarkLight1337 commented Oct 30, 2024 • edited Loading

Aakash-kaushik commented Jan 22, 2024 •

edited by simon-mo

Loading

mrPsycox commented Feb 8, 2024 •

edited

Loading

DarkLight1337 commented Oct 30, 2024 •

edited

Loading