Merge pull request #39 from remichu-ai/transformer_multimodal

Transformer multimodal
remichu-ai · Nov 19, 2024 · 718fd78 · 718fd78
2 parents 6541efa + 9cf1e26
commit 718fd78
Show file tree

Hide file tree

Showing 10 changed files with 420 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -22,6 +22,50 @@ Do checkout [TabbyAPI](https://github.com/theroyallab/tabbyAPI) if you want a re
 
 # Features
 
+# NEW - Vision Model
+
+From `gallama` version 0.0.7, there is experimental support for Vision model. 
+
+Currently, as of v0.0.8, Pixtral is supported via Exllama (>=0.2.4) and Qwen 2 VL series of model is supported via transformers.
+
+After Exllama roll out support for Qwen 2 VL, running model via transformers will be depreciated.
+Currently, both exllamaV2 and llama.cpp do not support Vision model yet. Hence, this is achieved by running `transformers` with the use of awq for quantization.
+
+1. Pixtral
+Currently, Pixtral Exl2 quantization from huggingface https://huggingface.co/turboderp/pixtral-12b-exl2 has a typo in the config file which set the context to 1mio tokens.
+The model support up to 128k token context, hence please set the context accordingly when u load the model
+
+```shell
+# sample
+gallama download pixtral:5.0
+gallama run -id "model_id=pixtral max_seq_len=32768"
+
+```
+
+2. Qwen 2 VL:
+As of this release, the transformers build in pip is not yet updated with bugfix for Qwen 2 VL, hence you will need to install the latest code from github.
+This is already be handled in the requirements.txt, however, getting transformers dependency working can be tricky.
+
+After installation you can download by following commands (choose a version that fit your VRAM):
+```shell
+# 2B model
+gallama download qwen-2-VL-2B:4.0 --backend=transformers
+gallama run qwen-2-VL-2B_transformers
+
+# 7B model
+gallama download qwen-2-VL-7B:4.0 --backend=transformers
+gallama run qwen-2-VL-7B_transformers
+
+# 72B model
+gallama download qwen-2-VL-72B:4.0 --backend=transformers
+gallama run qwen-2-VL-72B_transformers
+```
+
+If you need an UI to run it, check out Gallama UI, it is working with images, however, the support is not perfect at the moment:
+https://github.com/remichu-ai/gallamaUI.git
+![alt_text](https://github.com/remichu-ai/gallamaUI/blob/main/doc/gen.gif)
+
+
 ## Integrated Model downloader
 
 Ability to download exl2 model from Hugging Face via CLI for popular models.
@@ -80,9 +124,19 @@ gallama list available
 |               | llama_cpp    | `3.0`, `4.0`, `5.0`, `6.0`, `8.0`                                                      |
 | qwen-2.5-7B   | exllama      | `3.5`, `4.25`, `5.0`, `6.5`, `8.0`                                                     |
 |               | llama_cpp    | `3.0`, `4.0`, `5.0`, `6.0`, `8.0`                                                      |
+| qwen-2.5-Coder-32B    | exllama      | `2.2`, `3.0`, `3.5`, `4.25`, `5.0`, `6.5`, `8.0`                                       |
+| qwen-2.5-Coder-14B    | exllama      | `3.0`, `3.5`, `4.25`, `5.0`, `6.5`, `8.0`                                              |
+| qwen-2.5-Coder-7B     | exllama      | `3.5`, `4.25`, `5.0`, `6.5`, `8.0`                                                     |
+
+
+**Vision Large Language Models**
+
+| Model         | Backend      | Available Quantizations (bpw)                                                          |
+|---------------|--------------|----------------------------------------------------------------------------------------|
 | qwen-2-VL-2B  | transformers | `4.0`, `16.0`                                                                          |
 | qwen-2-VL-7B  | transformers | `4.0`, `16.0`                                                                          |
 | qwen-2-VL-72B | transformers | `4.0`, `16.0`                                                                          |
+| pixtral               | exllama      | `2.5`, `3.0`, `3.5`, `4.0`, `4.5`, `5.0`, `6.0`, `8.0`                                 |
 
 
 **Embedding Models:**
@@ -532,28 +586,3 @@ Customize the model launch using various parameters. Available parameters for th
 8. Others
    If you keep gallama config folder in another location instead of `~home/gallama` then you can set env parameter `GALLAMA_HOME_PATH` when running. 
 
-# NEW - Vision Model
-
-From `gallama` version 0.0.7, there is experimental support for Vision model. Currently only Qwen 2 VL series of model is supported.
-Currently, both exllamaV2 and llama.cpp do not support Vision model yet. Hence, this is achieved by running `transformers` with the use of awq for quantization.
-
-As of this release, the transformers build in pip is not yet updated with bugfix for Qwen 2 VL, hence you will need to install the latest code from github.
-This is already be handled in the requirements.txt.
-
-After installation you can download by following commands (choose a version that fit your VRAM):
-```shell
-# 2B model
-gallama download qwen-2-VL-2B:4.0 --backend=transformers
-gallama run qwen-2-VL-2B_transformers
-
-# 7B model
-gallama download qwen-2-VL-7B:4.0 --backend=transformers
-gallama run qwen-2-VL-7B_transformers
-
-# 72B model
-gallama download qwen-2-VL-72B:4.0 --backend=transformers
-gallama run qwen-2-VL-72B_transformers
-```
-
-If you need an UI to run it, check out Gallama UI:
-![alt_text](https://github.com/remichu-ai/gallamaUI/blob/main/doc/gen.gif)
diff --git a/examples/Examples_Notebook.ipynb b/examples/Examples_Notebook.ipynb
@@ -919,6 +919,114 @@
    "execution_count": null,
    "source": "",
    "id": "3d5f9621edfe4a58"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": [
+    "# Vision\n",
+    "\n",
+    "For model that support vision, it is compatible with OpenAI package"
+   ],
+   "id": "ac1156f8bc03bb77"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Using Image url",
+   "id": "6936b9d2338e5c0c"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": [
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "  model=\"pixtral\",\n",
+    "  messages=[\n",
+    "    {\n",
+    "      \"role\": \"user\",\n",
+    "      \"content\": [\n",
+    "        {\"type\": \"text\", \"text\": \"What’s in this image?\"},\n",
+    "        {\n",
+    "          \"type\": \"image_url\",\n",
+    "          \"image_url\": {\n",
+    "            \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\",\n",
+    "          },\n",
+    "        },\n",
+    "      ],\n",
+    "    }\n",
+    "  ],\n",
+    "  max_tokens=300,\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0])"
+   ],
+   "id": "72f3f5abbc3a0750"
+  },
+  {
+   "metadata": {},
+   "cell_type": "markdown",
+   "source": "## Using Image Base64 format",
+   "id": "a752b3073fc9a52e"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": [
+    "import base64\n",
+    "import requests\n",
+    "\n",
+    "# OpenAI API Key\n",
+    "api_key = \"YOUR_OPENAI_API_KEY\"\n",
+    "\n",
+    "\n",
+    "# Function to encode the image\n",
+    "def encode_image(image_path):\n",
+    "    with open(image_path, \"rb\") as image_file:\n",
+    "        return base64.b64encode(image_file.read()).decode('utf-8')\n",
+    "\n",
+    "\n",
+    "# Path to your image\n",
+    "image_path = \"/home/remichu/Downloads/demo.jpeg\"        # replace path to your image on local machine here\n",
+    "\n",
+    "# Getting the base64 string\n",
+    "base64_image = encode_image(image_path)\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"gpt-4o-mini\",\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": [\n",
+    "                {\n",
+    "                    \"type\": \"image_url\",\n",
+    "                    \"image_url\": {\n",
+    "                        \"url\": f\"data:image/jpeg;base64,{base64_image}\",\n",
+    "                    },\n",
+    "                },\n",
+    "                {\"type\": \"text\", \"text\": \"What’s in this image?\"},\n",
+    "            ],\n",
+    "        }\n",
+    "    ],\n",
+    "    max_tokens=300,\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0])"
+   ],
+   "id": "ba8097371b9167ce"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "outputs": [],
+   "execution_count": null,
+   "source": "",
+   "id": "60ac0a0353152781"
   }
  ],
  "metadata": {

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "gallama"
-version = "0.0.7"
+version = "0.0.8"
 description = "An opinionated Llama Server engine with a focus on agentic tasks"
 authors = [{name = "David", email = "trantrungduc91@example.com"}]
 license = {text = "MIT"}

diff --git a/requirements.txt b/requirements.txt
@@ -6,6 +6,7 @@ accelerate
 autoawq
 optimum
 pydantic
+pillow
 pytest
 httpx
 uvicorn
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,6 +6,7 @@ accelerate @@
     autoawq
     optimum
     pydantic
+    pillow
     pytest
     httpx
     uvicorn
@@ Expand Down @@