diff --git a/README.md b/README.md index 8f1e60f..6d990af 100644 --- a/README.md +++ b/README.md @@ -40,22 +40,25 @@ After a year's relentless efforts, today we are thrilled to release **Qwen2-VL**

-We opensourced Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the [API](https://help.aliyun.com/zh/model-studio/developer-reference/qwen-vl-api) of Qwen2-VL-72B! The opensource is integrated to Hugging Face Transformers, vLLM, and other third-party frameworks. Hope you enjoy! +We have open-sourced Qwen2-VL models, including Qwen2-VL-2B and Qwen2-VL-7B under the Apache 2.0 license, as well as Qwen2-VL-72B under the Qwen license. These models are now integrated with Hugging Face Transformers, vLLM, and other third-party frameworks. We hope you enjoy using them! + + ## News +* 2024.09.19: The instruction-tuned [Qwen2-VL-72B model](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) and its quantized version [[AWQ](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ), [GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4), [GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8)] are now available. * 2024.08.30: We have released the [Qwen2-VL series]("https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d). The 2B and 7B models are now available, and the 72B model for opensource is coming soon. For more details, please check our [blog](https://qwenlm.github.io/blog/qwen2-vl/)! ## Performance ### Image Benchmarks -| Benchmark | Previous SoTA
(Open-source LVLM) | Claude-3.5 Sonnet | GPT-4o | **Qwen2-VL-72B**
(Coming soon) |**Qwen2-VL-7B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct)) |**Qwen2-VL-2B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) +| Benchmark | Previous SoTA
(Open-source LVLM) | Claude-3.5 Sonnet | GPT-4o | **Qwen2-VL-72B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct) |**Qwen2-VL-7B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct)) |**Qwen2-VL-2B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) | :--- | :---: | :---: | :---: | :---: |:---: |:---: | | MMMUval | 58.3 | 68.3 | **69.1** | 64.5 | 54.1|41.1 | DocVQAtest | 94.1 | 95.2 | 92.8 | **96.5** | 94.5| 90.1 | InfoVQAtest | 82.0 | - | - | **84.5** | 76.5|65.5 | ChartQAtest | 88.4 | **90.8** | 85.7 | 88.3 |83.0| 73.5 | TextVQAval | 84.4 | - | - | **85.5** |84.3|79.7 -| OCRBench | 852 | 788 | 736 | **855** |845| 794 +| OCRBench | 852 | 788 | 736 | **877** |845| 794 | MTVQA | 17.3 | 25.7 | 27.8 | **30.9** |25.6| 18.1 | VCRen easy | 84.67 | 63.85 | 91.55 | **91.93** | 89.70| 81.45 | VCRzh easy | 22.09 | 1.0| 14.87 | **65.37** | 59.94| 46.16 @@ -74,7 +77,7 @@ We opensourced Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we relea ### Video Benchmarks -| Benchmark | Previous SoTA
(Open-source LVLM) | Gemini 1.5-Pro | GPT-4o | **Qwen2-VL-72B**
(Coming soon) |**Qwen2-VL-7B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct)) |**Qwen2-VL-2B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) +| Benchmark | Previous SoTA
(Open-source LVLM) | Gemini 1.5-Pro | GPT-4o | **Qwen2-VL-72B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct)) |**Qwen2-VL-7B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) [🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct)) |**Qwen2-VL-2B**
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | MVBench | 69.6 | - | - | **73.6** | 67.0| 63.2 | PerceptionTesttest | 66.9 | - | - | **68.0** | 62.3 |53.9 @@ -999,17 +1002,24 @@ We use [VLMEvalkit](https://github.com/open-compass/VLMEvalKit) to evaluate all | Model Size | Quantization | MMMU | DocVQA | MMBench | MathVista | | --- | --- | --- | --- | --- | --- | +| Qwen2-VL-72B-Instruct | BF16
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct)) | 65.44 | 95.79 | 86.94 | 70.19 | +| | GPTQ-Int8
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8)) | 64.56 | 95.84 | 87.03 | 68.90 | +| | GPTQ-Int4
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4)) | 64.00 | 95.70 | 86.68 | 69.20 | +| | AWQ
([🤗](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-AWQ)) | 64.22 | 95.72 | 86.43 | 68.40 | | Qwen2-VL-7B-Instruct | BF16
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct)) | 53.77 | 93.89 | 81.78 | 58.20 | | | GPTQ-Int8
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)) | 53.00 | 93.94 | 82.38 | 57.90 | | | GPTQ-Int4
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4)) | 52.55 | 93.16 | 81.27 | 60.30 | | | AWQ
([🤗](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-AWQ)) | 53.66 | 93.10 | 81.61 | 56.80 | - Qwen2-VL-2B-Instruct | BF16
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) | 41.88 | 88.34 | 72.07 | 44.40 | +| Qwen2-VL-2B-Instruct | BF16
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct)) | 41.88 | 88.34 | 72.07 | 44.40 | | | GPTQ-Int8
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8)) | 41.55 | 88.28 | 71.99 | 44.60 | | | GPTQ-Int4
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4)) | 39.22 | 87.21 | 70.87 | 41.69 | | | AWQ
([🤗](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-AWQ)) | 41.33 | 86.96 | 71.64 | 39.90 | + + + #### Speed Benchmark This section reports the speed performance of bf16 models, quantized models (including GPTQ-Int4, GPTQ-Int8 and AWQ) of the Qwen2-VL series. Specifically, we report the inference speed (tokens/s) as well as memory footprint (GB) under the conditions of different context lengths. @@ -1027,6 +1037,27 @@ Note: - We use the batch size of 1 and the least number of GPUs as possible for the evalution. - We test the speed and memory of generating 2048 tokens with the input lengths of 1, 6144, 14336, 30720, 63488, and 129024 tokens. +- 72B (transformers) + +| Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | +| --- | --- | --- | --- | --- | --- | +| Qwen2-VL-72B-Instruct | 1 | BF16 | 2 | 8.90 | 138.74 | +| | | GPTQ-Int8 | 2 | 9.53 | 75.173 | +| | | GPTQ-Int4 | 1 | 11.04 | 42.46 | +| | | AWQ | 1 | 12.00 | 41.98 | +| | 6144 | BF16 | 2 | 6.53 | 148.66 | +| | | GPTQ-Int8 | 2 | 6.97 | 85.09 | +| | | GPTQ-Int4 | 1 | 7.62 | 49.05 | +| | | AWQ | 1 | 8.33 | 48.58 | +| | 14336 | BF16 | 3 | 4.39 | 165.92 | +| | | GPTQ-Int8 | 2 | 5.04 | 99.31 | +| | | GPTQ-Int4 | 1 | 5.39 | 58.76 | +| | | AWQ | 1 | 5.72 | 58.29 | +| | 30720 | BF16 | 4 | 2.93 | 204.33 | +| | | GPTQ-Int8 | 2 | 3.16 | 127.77 | +| | | GPTQ-Int4 | 2 | 3.27 | 85.13 | +| | | AWQ | 2 | 3.39 | 94.65 | + - 7B (transformers) | Model | Input Length | Quantization | GPU Num | Speed(tokens/s) | GPU Memory(GB) | @@ -1070,6 +1101,8 @@ Note: | | | GPTQ-Int4 | 1 | 30.73 | 29.84 | | | | AWQ | 1 | 31.55 | 29.84 | + + ## Deployment We recommend using vLLM for fast Qwen2-VL deployment and inference. You need to use `vllm>=0.6.1` to enable Qwen2-VL support. You can also use our [official docker image](#-docker).