add llava model support #2153

AzureSilent · 2023-12-17T08:12:01Z

Add support for the LLava model:

huggingface hf-llava model loading, https://huggingface.co/docs/transformers/main/model_doc/llava
multimodal inference
text & image to text inference APIs

Note:
Currently, in order to minimize the impact on existing interfaces, a separate api_server has been used.
If the multimodal mode is widely accepted, they can be merged into one API and be mutually compatible.

Limit:

This is a basic implementation, where the visual model part directly uses transformers module and still needs to be optimized. But it not a big problem because the visual model is very small compared to LLM.
One should leave a little idle memory for the visual model. For example --gpu-memory-utilization 0.95. Need to leave more if not set --enforce-eager.

Usage:

# start server first
# python -m vllm.entrypoints.llava_server  --model /hf-llava-model --tensor-parallel-size 2 --gpu-memory-utilization 0.95

# client side
with open(im_path, 'rb') as f:
    image_file = f.read()
    encoded = base64.b64encode(image_file).decode('utf-8')

data = {
    "prompt": '<image>\n say something',
    'max_tokens':256,
    'images': [encoded],  # str or a list of str. can be **url** or **base64.**  must match the number of '<image>'
}

res = requests.post(f'http://localhost:8000/generate', json=data)

# ----------------
# There also a offline batched mode 
from vllm import LLM, SamplingParams, LLaVA
llm = LLaVA(model="...",  gpu_memory_utilization=0.95) 
prompts = [ 'prompt1',
'prompt2 <image> \n say',
'prompt3 <image> say something <image> something',
]
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=256)
outputs = llm.generate(prompts, sampling_params, images=[image]*3) # PIL url or base64。

I know there are still many areas that need improvement, but this is a runnable version. If it doesn't meet the release requirements, can this be pulled as a new feature branch? So we can continually improve this.
Thans for your review!

tjtanaa · 2023-12-20T06:12:52Z

Just something to share. Should we adopt the openai v1 api which also supports vision input? It is more generic and supports multiturn vision based conversation.
https://platform.openai.com/docs/guides/vision

response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

AzureSilent · 2023-12-22T01:30:43Z

Just something to share. Should we adopt the openai v1 api which also supports vision input? It is more generic and supports multiturn vision based conversation. https://platform.openai.com/docs/guides/vision

response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

Thanks, I think you are right, we should adopt this api if we surrport the multimodality feature in the entrypoints.openai.api_server.

However, this is still a experimental feature. and will change the current api alot, I'm not sure if I should do this ,may be creat a sideway entrypoint first?

esmeetu · 2023-12-25T13:42:58Z

Hi @AzureSilent, which model does this PR compatible with? https://huggingface.co/liuhaotian/llava-v1.5-13b seems not work.

AzureSilent · 2023-12-28T14:39:08Z

Hi @AzureSilent, which model does this PR compatible with? https://huggingface.co/liuhaotian/llava-v1.5-13b seems not work.

It compatible with the huggingface hf-llava model:
see: https://huggingface.co/docs/transformers/main/model_doc/llava
llava-hf/llava-1.5-7b-hf

llava-v1.5-13b this model’s weight file need to be converted accroding to huggingface's new implementation. This model is not supported by transfotmers >= 4.36. So we should switched to the new arc.

AzureSilent · 2023-12-28T14:45:32Z

hi，I'm using your pr.How to set sampling_params when offline batched mode?

You need to read the vllm's doc...

from vllm import LLM, SamplingParams, LLaVA
sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=256)

Costwen · 2023-12-29T06:03:11Z

Hi @AzureSilent ,I'm using llava-hf/llava-1.5-7b-hf to run you pr, but I got a error AttributeError: 'LlavaConfig' object has no attribute 'num_attention_heads'. I find in your pr , the llm_engine use llavaconfig.num_attention_heads in verify_with_parallel_config function , but in transformers, LlavaConfig need to use llavaconfig.text_config.num_attention_heads. Could you tell me how to fix the problem? Or It's just a transformers version problem?

AzureSilent · 2023-12-29T08:13:01Z

Hi @AzureSilent ,I'm using llava-hf/llava-1.5-7b-hf to run you pr, but I got a error AttributeError: 'LlavaConfig' object has no attribute 'num_attention_heads'. I find in your pr , the llm_engine use llavaconfig.num_attention_heads in verify_with_parallel_config function , but in transformers, LlavaConfig need to use llavaconfig.text_config.num_attention_heads. Could you tell me how to fix the problem? Or It's just a transformers version problem?

Sorry, I forgot that, please use the new commit

  "num_attention_heads": 32,
  "hidden_size": 4096,
  "num_hidden_layers": 32

isaac-vidas · 2024-01-14T20:03:05Z

It compatible with the huggingface hf-llava model: see: https://huggingface.co/docs/transformers/main/model_doc/llava llava-hf/llava-1.5-7b-hf

llava-v1.5-13b this model’s weight file need to be converted accroding to huggingface's new implementation. This model is not supported by transfotmers >= 4.36. So we should switched to the new arc.

Hi @AzureSilent thanks for putting this up!

I've been trying to use your implementation and load the llava-hf/llava-1.5-13b-hf model. However, when trying to run the llava_server command python -m vllm.entrypoints.llava_server --model llava-hf/llava-1.5-13b-hf --tensor-parallel-size 1 --gpu-memory-utilization 0.90 it produces the following error:

~$ python -m vllm.entrypoints.llava_server --model llava-hf/llava-1.5-13b-hf --tensor-parallel-size 1 --gpu-memory-utilization 0.90
WARNING 01-14 16:23:23 config.py:498] The model's config.json does not contain any of the following keys to determine the original maximum length of the model: ['max_position_embeddings', 'n_positions', 'max_seq_len', 'seq_length', 'max_sequence_length', 'max_seq_length', 'seq_len']. Assuming the model's maximum length is 2048.
INFO 01-14 16:23:23 llm_engine.py:73] Initializing an LLM engine with config: model='llava-hf/llava-1.5-13b-hf', tokenizer='llava-hf/llava-1.5-13b-hf', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/opt/conda/envs/llava_vllm/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/llava_vllm/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gcpuser/sky_workdir/vllm/vllm/entrypoints/llava_server.py", line 113, in <module>
    engine = AsyncLLaVAEngine.from_engine_args(engine_args)
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/async_llm_engine.py", line 494, in from_engine_args
    engine = cls(parallel_config.worker_use_ray,
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/async_llm_engine.py", line 267, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/async_llm_engine.py", line 312, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/llava_engine.py", line 18, in __init__
    super().__init__(*args, **kwargs)
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/llm_engine.py", line 114, in __init__
    self._init_workers(distributed_init_method)
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/llm_engine.py", line 150, in _init_workers
    self._run_workers(
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/llm_engine.py", line 755, in _run_workers
    self._run_workers_in_batch(workers, method, *args, **kwargs))
  File "/home/gcpuser/sky_workdir/vllm/vllm/engine/llm_engine.py", line 729, in _run_workers_in_batch
    output = executor(*args, **kwargs)
  File "/home/gcpuser/sky_workdir/vllm/vllm/worker/worker.py", line 79, in load_model
    self.model_runner.load_model()
  File "/home/gcpuser/sky_workdir/vllm/vllm/worker/model_runner.py", line 61, in load_model
    self.model = get_model(self.model_config)
  File "/home/gcpuser/sky_workdir/vllm/vllm/model_executor/model_loader.py", line 36, in get_model
    model_class = _get_model_architecture(model_config.hf_config)
  File "/home/gcpuser/sky_workdir/vllm/vllm/model_executor/model_loader.py", line 30, in _get_model_architecture
    raise ValueError(
ValueError: Model architectures ['LlavaForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'DeciLMForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'LlamaForCausalLM', 'LlavaForConditionalGeneration', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'RWForCausalLM', 'YiForCausalLM']

I presume that this happens because the architecture is set to LlavaForCausalLM in the model config file but I did notice that in the model card it seems like you can load the model using the LlavaForConditionalGeneration class from transformers:

from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "llava-hf/llava-1.5-13b-hf"

prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
).to(0)

Would it require a different implementation for supporting llava-hf/llava-1.5-13b-hf in your branch or would it be possible to re-use the same LlavaForConditionalGeneration that you implemented for the 13b model as well?
Just as a test, I've updated the code in your branch on my local environment to point the LlavaForCausalLM specified in the config.json of the 13b model to the same LlavaForConditionalGeneration you added in your implementation and it seems to be working as well as return predictions but perhaps I'm missing something more critical?

~$ python -m vllm.entrypoints.llava_server \
    --model llava-hf/llava-1.5-13b-hf \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.90
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.09k/1.09k [00:00<00:00, 281kB/s]
WARNING 01-14 16:45:42 config.py:498] The model's config.json does not contain any of the following keys to determine the original maximum length of the model: ['max_position_embeddings', 'n_positions', 'max_seq_len', 'seq_length', 'max_sequence_length', 'max_seq_length', 'seq_len']. Assuming the model's maximum length is 2048.
INFO 01-14 16:45:42 llm_engine.py:73] Initializing an LLM engine with config: model='llava-hf/llava-1.5-13b-hf', tokenizer='llava-hf/llava-1.5-13b-hf', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.33k/1.33k [00:00<00:00, 363kB/s]
tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 12.4MB/s]
tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 14.2MB/s]
added_tokens.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41.0/41.0 [00:00<00:00, 25.3kB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 438/438 [00:00<00:00, 273kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
model-00006-of-00006.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.02G/2.02G [00:22<00:00, 89.2MB/s]
model-00002-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [00:44<00:00, 112MB/s]
model-00001-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.96G/4.96G [00:44<00:00, 111MB/s]
model-00003-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.88G/4.88G [00:44<00:00, 109MB/s]
model-00004-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.93G/4.93G [00:45<00:00, 109MB/s]
model-00005-of-00006.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.93G/4.93G [00:46<00:00, 107MB/s]
INFO 01-14 16:46:40 llm_engine.py:227] # GPU blocks: 842, # CPU blocks: 327██████████████████████████████████████████████████████████████████████| 4.93G/4.93G [00:46<00:00, 164MB/s]
INFO 01-14 16:46:43 model_runner.py:456] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-14 16:46:43 model_runner.py:460] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode.
INFO 01-14 16:46:48 model_runner.py:502] Graph capturing finished in 5 secs.
preprocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 557/557 [00:00<00:00, 147kB/s]
INFO:     Started server process [16998]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

AzureSilent · 2024-01-18T01:43:30Z

Hi @isaac-vidas Thanks for your reply!

I think this is just a minor issue caused by inconsistent naming. In the config.json of llava-hf/llava-1.5-13b-hf, the name of the architecture is 'LlavaForCausalLM', but in llava-hf/llava-1.5-7b-hf the architecture is 'LlavaForConditionalGeneration'. I apologize for not noticing this earlier, I will make the code compatible. But for now, all you need to do is just change the architectures in the config.json file to 'LlavaForConditionalGeneration'. This won't impact the inference.

isaac-vidas · 2024-01-18T01:50:11Z

Great, thanks for the reply and suggestion @AzureSilent !
I agree that making the change in the model's config file is probably the right way to do it.

This PR seems to be working great and easy to get working with the HF LLaVA models, is there any reason why this wasn't merged to the main branch?

mmgxa · 2024-01-20T06:10:06Z

Hi,

Thanks for the PR. I actually modified the openai api_server to make it work with llava (your commit). The chat completion works fine, but the streaming does not work. Could you please take a look into that? the code is here. And the modified protocol file is here.

kausky · 2024-01-30T14:59:49Z

hi, Tried to merge this with the current main branch of vllm on my branch(llava-main)
with this code:

from vllm import SamplingParams, LLaVA

image = 'URL'
llm = LLaVA(model="llava-hf/llava-1.5-7b-hf", tensor_parallel_size=1,gpu_memory_utilization=0.95, enforce_eager=True) 
prompts = [ 'prompt1',
'prompt2 <image> \n say',
'prompt3 <image> say something <image> something',
]
sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=256)
outputs = llm.generate(prompts, sampling_params, images=[image]*3) # PIL url or base64。

but it has errors:

WARNING 01-30 14:55:51 config.py:584] The model's config.json does not contain any of the following keys to determine the original maximum length of the model: ['max_position_embeddings', 'n_positions', 'max_seq_len', 'seq_length', 'max_sequence_length', 'max_seq_length', 'seq_len']. Assuming the model's maximum length is 2048.
INFO 01-30 14:55:51 llm_engine.py:72] Initializing an LLM engine with config: model='llava-hf/llava-1.5-7b-hf', tokenizer='llava-hf/llava-1.5-7b-hf', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 01-30 14:55:55 custom_all_reduce.py:33] Custom allreduce is disabled due to an unsupported world size: 1. Supported world sizes: [2, 4, 6, 8]. To slience this warning, specifydisable_custom_all_reduce=True explicitly.
INFO 01-30 14:55:56 weight_utils.py:164] Using model weights format ['*.safetensors']
INFO 01-30 14:56:01 llm_engine.py:322] # GPU blocks: 3073, # CPU blocks: 512
<vllm.engine.llava_engine.LLaVAEngine object at 0x7fb8751b10c0>
num_workers: 0
Traceback (most recent call last):
  File "../vllm/llava-try.py", line 10, in <module>
    outputs = llm.generate(prompts, sampling_params, images=[image]*3) # PIL url or base64。
  File "../vllm/vllm/entrypoints/llava_llm.py", line 170, in generate
    self._add_request(prompt, sampling_params, token_ids, images)
  File "../vllm/vllm/entrypoints/llava_llm.py", line 181, in _add_request
    self.llm_engine.add_request(request_id,
  File "../vllm/vllm/engine/llava_engine.py", line 69, in add_request
    worker = self.workers[np.random.randint(num_workers)]
  File "mtrand.pyx", line 781, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1334, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

I am assuming this is because of the workers not being initialised properly.
some help on this would be appreciated

kausky · 2024-01-30T17:00:40Z

hi, Tried to merge this with the current main branch of vllm on my branch(llava-main) with this code:

from vllm import SamplingParams, LLaVA

image = 'URL'
llm = LLaVA(model="llava-hf/llava-1.5-7b-hf", tensor_parallel_size=1,gpu_memory_utilization=0.95, enforce_eager=True) 
prompts = [ 'prompt1',
'prompt2 <image> \n say',
'prompt3 <image> say something <image> something',
]
sampling_params = SamplingParams(temperature=0.2, top_p=0.95, max_tokens=256)
outputs = llm.generate(prompts, sampling_params, images=[image]*3) # PIL url or base64。

but it has errors:

WARNING 01-30 14:55:51 config.py:584] The model's config.json does not contain any of the following keys to determine the original maximum length of the model: ['max_position_embeddings', 'n_positions', 'max_seq_len', 'seq_length', 'max_sequence_length', 'max_seq_length', 'seq_len']. Assuming the model's maximum length is 2048.
INFO 01-30 14:55:51 llm_engine.py:72] Initializing an LLM engine with config: model='llava-hf/llava-1.5-7b-hf', tokenizer='llava-hf/llava-1.5-7b-hf', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 01-30 14:55:55 custom_all_reduce.py:33] Custom allreduce is disabled due to an unsupported world size: 1. Supported world sizes: [2, 4, 6, 8]. To slience this warning, specifydisable_custom_all_reduce=True explicitly.
INFO 01-30 14:55:56 weight_utils.py:164] Using model weights format ['*.safetensors']
INFO 01-30 14:56:01 llm_engine.py:322] # GPU blocks: 3073, # CPU blocks: 512
<vllm.engine.llava_engine.LLaVAEngine object at 0x7fb8751b10c0>
num_workers: 0
Traceback (most recent call last):
  File "../vllm/llava-try.py", line 10, in <module>
    outputs = llm.generate(prompts, sampling_params, images=[image]*3) # PIL url or base64。
  File "../vllm/vllm/entrypoints/llava_llm.py", line 170, in generate
    self._add_request(prompt, sampling_params, token_ids, images)
  File "../vllm/vllm/entrypoints/llava_llm.py", line 181, in _add_request
    self.llm_engine.add_request(request_id,
  File "../vllm/vllm/engine/llava_engine.py", line 69, in add_request
    worker = self.workers[np.random.randint(num_workers)]
  File "mtrand.pyx", line 781, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1334, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

I am assuming this is because of the workers not being initialised properly. some help on this would be appreciated

got this working on the llava-main branch

matankley · 2024-02-12T21:27:06Z

@Aakash-Kaushik Thanks for the great work on your branch. Does it support llava-v1.6 models as well?

kausky · 2024-02-13T01:58:26Z

Hi @matankley my branch currently only supports the mentioned model, you might have to do some work to add 1.6

isaac-vidas · 2024-02-13T02:29:51Z

@matankley while support for LLaVA in vllm is ongoing, you can also look into SGLang. It's built on top of vllm and supports LLaVA 1.5 and 1.6 as well as distributed inference across GPUs and json decoding among other features.

ywang96 · 2024-03-26T09:06:02Z

Thank you @AzureSilent for opening this PR! Closing this as we merged #3042 - Please feel free to comment on it!

AzureSilent added 2 commits December 17, 2023 06:31

add llava model

ba8d8d1

Merge remote-tracking branch 'origin/main' into HEAD

3af7b34

AzureSilent force-pushed the llava_devel branch from db7731a to 02284c4 Compare December 17, 2023 11:46

Merge remote-tracking branch 'origin/HEAD' into llava_devel

3a38396

AzureSilent force-pushed the llava_devel branch from 02284c4 to 3a38396 Compare December 17, 2023 12:02

fix CUDA graph mode

c58ca15

bug fix

a8b0dbc

AzureSilent force-pushed the llava_devel branch 2 times, most recently from cc58376 to 0759e59 Compare December 29, 2023 14:47

Merge remote-tracking branch 'AzureGIT/main' into llava_devel

a039da9

AzureSilent force-pushed the llava_devel branch from 0759e59 to a039da9 Compare December 29, 2023 14:57

zhuohan123 mentioned this pull request Jan 31, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

ywang96 closed this Mar 26, 2024

Uh oh!

add llava model support #2153

add llava model support #2153

Uh oh!

Conversation

AzureSilent commented Dec 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjtanaa commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzureSilent commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esmeetu commented Dec 25, 2023

Uh oh!

AzureSilent commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AzureSilent commented Dec 28, 2023

Uh oh!

Costwen commented Dec 29, 2023

Uh oh!

AzureSilent commented Dec 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isaac-vidas commented Jan 14, 2024

Uh oh!

AzureSilent commented Jan 18, 2024

Uh oh!

isaac-vidas commented Jan 18, 2024

Uh oh!

mmgxa commented Jan 20, 2024

Uh oh!

kausky commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kausky commented Jan 30, 2024

Uh oh!

matankley commented Feb 12, 2024

Uh oh!

kausky commented Feb 13, 2024

Uh oh!

isaac-vidas commented Feb 13, 2024

Uh oh!

ywang96 commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AzureSilent commented Dec 17, 2023 •

edited

Loading

tjtanaa commented Dec 20, 2023 •

edited

Loading

AzureSilent commented Dec 22, 2023 •

edited

Loading

AzureSilent commented Dec 28, 2023 •

edited

Loading

AzureSilent commented Dec 29, 2023 •

edited

Loading

kausky commented Jan 30, 2024 •

edited

Loading

ywang96 commented Mar 26, 2024 •

edited

Loading