Skip to content

Commit dc04c69

Browse files
authored
LlamaIndex Multi_Modal_Llms Integration: Huggingface (#16133)
1 parent 530d88f commit dc04c69

File tree

17 files changed

+2035
-0
lines changed

17 files changed

+2035
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
poetry_requirements(
2+
name="poetry",
3+
)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
GIT_ROOT ?= $(shell git rev-parse --show-toplevel)
2+
3+
help: ## Show all Makefile targets.
4+
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'
5+
6+
format: ## Run code autoformatters (black).
7+
pre-commit install
8+
git ls-files | xargs pre-commit run black --files
9+
10+
lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
11+
pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files
12+
13+
test: ## Run tests via pytest.
14+
pytest tests
15+
16+
watch-docs: ## Build and watch documentation.
17+
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# LlamaIndex Multi_Modal_Llms Integration: Huggingface
2+
3+
This project integrates Hugging Face's multimodal language models into the LlamaIndex framework, enabling advanced multimodal capabilities for various AI applications.
4+
5+
## Features
6+
7+
- Seamless integration of Hugging Face multimodal models with LlamaIndex
8+
- Support for multiple state-of-the-art vision-language models and their **finetunes**:
9+
- [Qwen2 Vision](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
10+
- [Florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de)
11+
- [Phi-3.5 Vision](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3)
12+
- [PaLI-Gemma](https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda)
13+
- Easy-to-use interface for multimodal tasks like image captioning and visual question answering
14+
- Configurable model parameters for fine-tuned performance
15+
16+
---
17+
18+
## Author of that Integration [GitHub](https://github.com/g-hano) | [LinkedIn](https://www.linkedin.com/in/chanyalcin/) | [Email](mcihan.yalcin@outlook.com)
19+
20+
## Installation
21+
22+
```bash
23+
pip install llama-index-multi-modal-llms-huggingface
24+
```
25+
26+
Make sure to set your Hugging Face API token as an environment variable:
27+
28+
```bash
29+
export HF_TOKEN=your_huggingface_token_here
30+
```
31+
32+
## Usage
33+
34+
Here's a basic example of how to use the Hugging Face multimodal integration:
35+
36+
```python
37+
from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal
38+
from llama_index.schema import ImageDocument
39+
40+
# Initialize the model
41+
model = HuggingFaceMultiModal.from_model_name("Qwen/Qwen2-VL-2B-Instruct")
42+
43+
# Prepare your image and prompt
44+
image_document = ImageDocument(image_path="path/to/your/image.jpg")
45+
prompt = "Describe this image in detail."
46+
47+
# Generate a response
48+
response = model.complete(prompt, image_documents=[image_document])
49+
50+
print(response.text)
51+
```
52+
53+
You can also refer to this [Colab notebook](examples\huggingface_multimodal.ipynb)
54+
55+
## Supported Models
56+
57+
1. Qwen2VisionMultiModal
58+
2. Florence2MultiModal
59+
3. Phi35VisionMultiModal
60+
4. PaliGemmaMultiModal
61+
62+
Each model has its unique capabilities and can be selected based on your specific use case.
63+
64+
## Configuration
65+
66+
You can configure various parameters when initializing a model:
67+
68+
```python
69+
model = HuggingFaceMultiModal(
70+
model_name="Qwen/Qwen2-VL-2B-Instruct",
71+
device="cuda", # or "cpu"
72+
torch_dtype=torch.float16,
73+
max_new_tokens=100,
74+
temperature=0.7,
75+
)
76+
```
77+
78+
## Limitations
79+
80+
- Async streaming is not supported for any of the models.
81+
- Some models have specific requirements or limitations. Please refer to the individual model classes for details.
82+
83+
---
84+
85+
## Author of that Integration [GitHub](https://github.com/g-hano) | [LinkedIn](https://www.linkedin.com/in/chanyalcin/) | [Email](mcihan.yalcin@outlook.com)

llama-index-integrations/multi_modal_llms/llama-index-multi-modal-llms-huggingface/examples/huggingface_multimodal.ipynb

Lines changed: 1296 additions & 0 deletions
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
python_sources()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from llama_index.multi_modal_llms.huggingface.base import HuggingFaceMultiModal
2+
3+
__all__ = ["HuggingFaceMultiModal"]

0 commit comments

Comments
 (0)