Skip to content

Commit

Permalink
Merge pull request #39 from remichu-ai/transformer_multimodal
Browse files Browse the repository at this point in the history
Transformer multimodal
  • Loading branch information
remichu-ai authored Nov 19, 2024
2 parents 6541efa + 9cf1e26 commit 718fd78
Show file tree
Hide file tree
Showing 10 changed files with 420 additions and 47 deletions.
79 changes: 54 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,50 @@ Do checkout [TabbyAPI](https://github.com/theroyallab/tabbyAPI) if you want a re

# Features

# NEW - Vision Model

From `gallama` version 0.0.7, there is experimental support for Vision model.

Currently, as of v0.0.8, Pixtral is supported via Exllama (>=0.2.4) and Qwen 2 VL series of model is supported via transformers.

After Exllama roll out support for Qwen 2 VL, running model via transformers will be depreciated.
Currently, both exllamaV2 and llama.cpp do not support Vision model yet. Hence, this is achieved by running `transformers` with the use of awq for quantization.

1. Pixtral
Currently, Pixtral Exl2 quantization from huggingface https://huggingface.co/turboderp/pixtral-12b-exl2 has a typo in the config file which set the context to 1mio tokens.
The model support up to 128k token context, hence please set the context accordingly when u load the model

```shell
# sample
gallama download pixtral:5.0
gallama run -id "model_id=pixtral max_seq_len=32768"

```

2. Qwen 2 VL:
As of this release, the transformers build in pip is not yet updated with bugfix for Qwen 2 VL, hence you will need to install the latest code from github.
This is already be handled in the requirements.txt, however, getting transformers dependency working can be tricky.

After installation you can download by following commands (choose a version that fit your VRAM):
```shell
# 2B model
gallama download qwen-2-VL-2B:4.0 --backend=transformers
gallama run qwen-2-VL-2B_transformers

# 7B model
gallama download qwen-2-VL-7B:4.0 --backend=transformers
gallama run qwen-2-VL-7B_transformers

# 72B model
gallama download qwen-2-VL-72B:4.0 --backend=transformers
gallama run qwen-2-VL-72B_transformers
```

If you need an UI to run it, check out Gallama UI, it is working with images, however, the support is not perfect at the moment:
https://github.com/remichu-ai/gallamaUI.git
![alt_text](https://github.com/remichu-ai/gallamaUI/blob/main/doc/gen.gif)


## Integrated Model downloader

Ability to download exl2 model from Hugging Face via CLI for popular models.
Expand Down Expand Up @@ -80,9 +124,19 @@ gallama list available
| | llama_cpp | `3.0`, `4.0`, `5.0`, `6.0`, `8.0` |
| qwen-2.5-7B | exllama | `3.5`, `4.25`, `5.0`, `6.5`, `8.0` |
| | llama_cpp | `3.0`, `4.0`, `5.0`, `6.0`, `8.0` |
| qwen-2.5-Coder-32B | exllama | `2.2`, `3.0`, `3.5`, `4.25`, `5.0`, `6.5`, `8.0` |
| qwen-2.5-Coder-14B | exllama | `3.0`, `3.5`, `4.25`, `5.0`, `6.5`, `8.0` |
| qwen-2.5-Coder-7B | exllama | `3.5`, `4.25`, `5.0`, `6.5`, `8.0` |


**Vision Large Language Models**

| Model | Backend | Available Quantizations (bpw) |
|---------------|--------------|----------------------------------------------------------------------------------------|
| qwen-2-VL-2B | transformers | `4.0`, `16.0` |
| qwen-2-VL-7B | transformers | `4.0`, `16.0` |
| qwen-2-VL-72B | transformers | `4.0`, `16.0` |
| pixtral | exllama | `2.5`, `3.0`, `3.5`, `4.0`, `4.5`, `5.0`, `6.0`, `8.0` |


**Embedding Models:**
Expand Down Expand Up @@ -532,28 +586,3 @@ Customize the model launch using various parameters. Available parameters for th
8. Others
If you keep gallama config folder in another location instead of `~home/gallama` then you can set env parameter `GALLAMA_HOME_PATH` when running.
# NEW - Vision Model
From `gallama` version 0.0.7, there is experimental support for Vision model. Currently only Qwen 2 VL series of model is supported.
Currently, both exllamaV2 and llama.cpp do not support Vision model yet. Hence, this is achieved by running `transformers` with the use of awq for quantization.
As of this release, the transformers build in pip is not yet updated with bugfix for Qwen 2 VL, hence you will need to install the latest code from github.
This is already be handled in the requirements.txt.
After installation you can download by following commands (choose a version that fit your VRAM):
```shell
# 2B model
gallama download qwen-2-VL-2B:4.0 --backend=transformers
gallama run qwen-2-VL-2B_transformers
# 7B model
gallama download qwen-2-VL-7B:4.0 --backend=transformers
gallama run qwen-2-VL-7B_transformers
# 72B model
gallama download qwen-2-VL-72B:4.0 --backend=transformers
gallama run qwen-2-VL-72B_transformers
```
If you need an UI to run it, check out Gallama UI:
![alt_text](https://github.com/remichu-ai/gallamaUI/blob/main/doc/gen.gif)
108 changes: 108 additions & 0 deletions examples/Examples_Notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -919,6 +919,114 @@
"execution_count": null,
"source": "",
"id": "3d5f9621edfe4a58"
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"# Vision\n",
"\n",
"For model that support vision, it is compatible with OpenAI package"
],
"id": "ac1156f8bc03bb77"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Using Image url",
"id": "6936b9d2338e5c0c"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"\n",
"response = client.chat.completions.create(\n",
" model=\"pixtral\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\"type\": \"text\", \"text\": \"What’s in this image?\"},\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\",\n",
" },\n",
" },\n",
" ],\n",
" }\n",
" ],\n",
" max_tokens=300,\n",
")\n",
"\n",
"print(response.choices[0])"
],
"id": "72f3f5abbc3a0750"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Using Image Base64 format",
"id": "a752b3073fc9a52e"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": [
"import base64\n",
"import requests\n",
"\n",
"# OpenAI API Key\n",
"api_key = \"YOUR_OPENAI_API_KEY\"\n",
"\n",
"\n",
"# Function to encode the image\n",
"def encode_image(image_path):\n",
" with open(image_path, \"rb\") as image_file:\n",
" return base64.b64encode(image_file.read()).decode('utf-8')\n",
"\n",
"\n",
"# Path to your image\n",
"image_path = \"/home/remichu/Downloads/demo.jpeg\" # replace path to your image on local machine here\n",
"\n",
"# Getting the base64 string\n",
"base64_image = encode_image(image_path)\n",
"\n",
"response = client.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": f\"data:image/jpeg;base64,{base64_image}\",\n",
" },\n",
" },\n",
" {\"type\": \"text\", \"text\": \"What’s in this image?\"},\n",
" ],\n",
" }\n",
" ],\n",
" max_tokens=300,\n",
")\n",
"\n",
"print(response.choices[0])"
],
"id": "ba8097371b9167ce"
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "60ac0a0353152781"
}
],
"metadata": {
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "gallama"
version = "0.0.7"
version = "0.0.8"
description = "An opinionated Llama Server engine with a focus on agentic tasks"
authors = [{name = "David", email = "trantrungduc91@example.com"}]
license = {text = "MIT"}
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ accelerate
autoawq
optimum
pydantic
pillow
pytest
httpx
uvicorn
Expand Down
Loading

0 comments on commit 718fd78

Please sign in to comment.