|
8 | 8 | "source": [ |
9 | 9 | "# 🖼️ Introduction to Multimodal Text Generation\n", |
10 | 10 | "\n", |
11 | | - "In this notebook, we introduce the experimental features we've developed so far for multimodal text generation in Haystack. The experiment is ongoing, so expect more in the future.\n", |
| 11 | + "In this notebook, we introduce the features that enable multimodal text generation in Haystack.\n", |
12 | 12 | "\n", |
13 | 13 | "- We introduced the `ImageContent` dataclass, which represents the image content of a user `ChatMessage`.\n", |
14 | 14 | "- We developed some image converter components.\n", |
|
75 | 75 | "source": [ |
76 | 76 | "## Introduction to `ImageContent`\n", |
77 | 77 | "\n", |
78 | | - "[`ImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n", |
| 78 | + "[`ImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/dataclasses/image_content.py) is a new dataclass that stores the image content of a user `ChatMessage`.\n", |
79 | 79 | "\n", |
80 | 80 | "It has the following attributes:\n", |
81 | 81 | "- `base64_image`: A base64 string representing the image.\n", |
|
129 | 129 | }, |
130 | 130 | "outputs": [], |
131 | 131 | "source": [ |
132 | | - "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n", |
133 | | - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
| 132 | + "from haystack.dataclasses import ImageContent, ChatMessage\n", |
| 133 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
134 | 134 | "import base64\n", |
135 | 135 | "\n", |
136 | 136 | "with open(\"capybara.jpg\", \"rb\") as fd:\n", |
|
364 | 364 | "## Image Converters for `ImageContent`\n", |
365 | 365 | "\n", |
366 | 366 | "To perform image conversion in multimodal pipelines, we also introduced two image converters:\n", |
367 | | - "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n", |
368 | | - "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack-experimental/blob/main/haystack_experimental/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects." |
| 367 | + "- [`ImageFileToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/file_to_image.py), which converts image files to `ImageContent` objects (similar to `from_file_path`).\n", |
| 368 | + "- [`PDFToImageContent`](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/image/pdf_to_image.py), which converts PDF files to `ImageContent` objects." |
369 | 369 | ] |
370 | 370 | }, |
371 | 371 | { |
|
376 | 376 | }, |
377 | 377 | "outputs": [], |
378 | 378 | "source": [ |
379 | | - "from haystack_experimental.components.converters.image import ImageFileToImageContent\n", |
| 379 | + "from haystack.components.converters.image import ImageFileToImageContent\n", |
380 | 380 | "\n", |
381 | 381 | "converter = ImageFileToImageContent(detail=\"low\", size=(300, 300))\n", |
382 | 382 | "result = converter.run(sources=[\"capybara.jpg\"])" |
|
477 | 477 | } |
478 | 478 | ], |
479 | 479 | "source": [ |
480 | | - "from haystack_experimental.components.converters.image import PDFToImageContent\n", |
| 480 | + "from haystack.components.converters.image import PDFToImageContent\n", |
481 | 481 | "\n", |
482 | 482 | "pdf_converter = PDFToImageContent()\n", |
483 | 483 | "paper_page_image = pdf_converter.run(sources=[\"flan_paper.pdf\"], page_range=\"9\")[\"image_contents\"][0]\n", |
|
615 | 615 | } |
616 | 616 | ], |
617 | 617 | "source": [ |
618 | | - "from haystack_experimental.components.builders import ChatPromptBuilder\n", |
| 618 | + "from haystack.components.builders import ChatPromptBuilder\n", |
619 | 619 | "\n", |
620 | 620 | "builder = ChatPromptBuilder(template, required_variables=\"*\")\n", |
621 | 621 | "\n", |
|
786 | 786 | "outputs": [], |
787 | 787 | "source": [ |
788 | 788 | "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n", |
789 | | - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
790 | | - "from haystack_experimental.dataclasses import ImageContent, ChatMessage\n", |
| 789 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
| 790 | + "from haystack.dataclasses import ImageContent, ChatMessage\n", |
791 | 791 | "\n", |
792 | 792 | "retriever = InMemoryBM25Retriever(document_store=document_store)\n", |
793 | 793 | "llm = OpenAIChatGenerator(model=\"gpt-4o-mini\")\n", |
|
893 | 893 | }, |
894 | 894 | { |
895 | 895 | "cell_type": "markdown", |
896 | | - "metadata": { |
897 | | - "id": "hgPfD4sD0uqW" |
898 | | - }, |
899 | | - "source": [ |
900 | | - "### Using a Pipeline\n", |
901 | | - "\n", |
902 | | - "The retrieval part of the application showed above can also be implemented using a Pipeline.\n", |
903 | | - "\n", |
904 | | - "As you can see, there are some aspects to improve in terms of developer experience and we will work in this direction." |
905 | | - ] |
906 | | - }, |
907 | | - { |
908 | | - "cell_type": "code", |
909 | | - "execution_count": null, |
910 | | - "metadata": { |
911 | | - "colab": { |
912 | | - "base_uri": "https://localhost:8080/" |
913 | | - }, |
914 | | - "id": "6cFG-YQq0uWg", |
915 | | - "outputId": "4ad910e9-0d14-4f4c-9e2c-85e3051083c3" |
916 | | - }, |
917 | | - "outputs": [ |
918 | | - { |
919 | | - "data": { |
920 | | - "text/plain": [ |
921 | | - "<haystack.core.pipeline.pipeline.Pipeline object at 0x7c607962a550>\n", |
922 | | - "🚅 Components\n", |
923 | | - " - retriever: InMemoryBM25Retriever\n", |
924 | | - " - output_adapter: OutputAdapter\n", |
925 | | - " - image_converter: ImageFileToImageContent\n", |
926 | | - " - prompt_builder: ChatPromptBuilder\n", |
927 | | - " - generator: OpenAIChatGenerator\n", |
928 | | - "🛤️ Connections\n", |
929 | | - " - retriever.documents -> output_adapter.documents (List[Document])\n", |
930 | | - " - output_adapter.output -> image_converter.sources (List[str])\n", |
931 | | - " - image_converter.image_contents -> prompt_builder.image_contents (List[ImageContent])\n", |
932 | | - " - prompt_builder.prompt -> generator.messages (List[ChatMessage])" |
933 | | - ] |
934 | | - }, |
935 | | - "execution_count": 48, |
936 | | - "metadata": {}, |
937 | | - "output_type": "execute_result" |
938 | | - } |
939 | | - ], |
940 | | - "source": [ |
941 | | - "from typing import List\n", |
942 | | - "\n", |
943 | | - "from haystack import Pipeline\n", |
944 | | - "from haystack.components.converters import OutputAdapter\n", |
945 | | - "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever\n", |
946 | | - "\n", |
947 | | - "from haystack_experimental.components.builders import ChatPromptBuilder\n", |
948 | | - "from haystack_experimental.components.converters.image import ImageFileToImageContent\n", |
949 | | - "from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator\n", |
950 | | - "\n", |
951 | | - "\n", |
952 | | - "chat_template = \"\"\"\n", |
953 | | - "{% message role=\"user\" %}\n", |
954 | | - "{{query}}\n", |
955 | | - "{% for image_content in image_contents %}\n", |
956 | | - " {{image_content | templatize_part}}\n", |
957 | | - "{% endfor %}\n", |
958 | | - "{% endmessage %}\n", |
959 | | - "\"\"\"\n", |
960 | | - "\n", |
961 | | - "output_adapter_template = \"\"\"\n", |
962 | | - "{%- set paths = [] -%}\n", |
963 | | - "{% for document in documents %}\n", |
964 | | - " {%- set _ = paths.append(document.meta.image_path) -%}\n", |
965 | | - "{% endfor %}\n", |
966 | | - "{{paths}}\n", |
967 | | - "\"\"\"\n", |
968 | | - "\n", |
969 | | - "rag_pipeline = Pipeline()\n", |
970 | | - "\n", |
971 | | - "rag_pipeline.add_component(\"retriever\", InMemoryBM25Retriever(document_store=document_store, top_k=1))\n", |
972 | | - "rag_pipeline.add_component(\"output_adapter\", OutputAdapter(template=output_adapter_template, output_type=List[str]))\n", |
973 | | - "rag_pipeline.add_component(\"image_converter\", ImageFileToImageContent(detail=\"auto\"))\n", |
974 | | - "rag_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=chat_template, required_variables=\"*\"))\n", |
975 | | - "rag_pipeline.add_component(\"generator\", OpenAIChatGenerator(model=\"gpt-4o-mini\"))\n", |
976 | | - "\n", |
977 | | - "rag_pipeline.connect(\"retriever.documents\", \"output_adapter.documents\")\n", |
978 | | - "rag_pipeline.connect(\"output_adapter.output\", \"image_converter.sources\")\n", |
979 | | - "rag_pipeline.connect(\"image_converter.image_contents\", \"prompt_builder.image_contents\")\n", |
980 | | - "rag_pipeline.connect(\"prompt_builder.prompt\", \"generator.messages\")" |
981 | | - ] |
982 | | - }, |
983 | | - { |
984 | | - "cell_type": "code", |
985 | | - "execution_count": null, |
986 | | - "metadata": { |
987 | | - "colab": { |
988 | | - "base_uri": "https://localhost:8080/" |
989 | | - }, |
990 | | - "id": "uX-y-mC038c5", |
991 | | - "outputId": "b1b85e55-8dce-4254-e18f-df4f92c9aeeb" |
992 | | - }, |
993 | | - "outputs": [ |
994 | | - { |
995 | | - "name": "stdout", |
996 | | - "output_type": "stream", |
997 | | - "text": [ |
998 | | - "('The image illustrates how LoRA (Low-Rank Adaptation) systems learn \"intruder '\n", |
999 | | - " 'dimensions\"—singular vectors that differ from those in the pre-trained '\n", |
1000 | | - " 'weight matrix during fine-tuning. \\n'\n", |
1001 | | - " '\\n'\n", |
1002 | | - " '- **Panel (a)** shows the architecture of LoRA and full fine-tuning, '\n", |
1003 | | - " 'emphasizing the addition of learned parameters \\\\( B \\\\) and \\\\( A \\\\) in '\n", |
1004 | | - " 'LoRA.\\n'\n", |
1005 | | - " '- **Panel (b)** compares the cosine similarity of singular vectors from LoRA '\n", |
1006 | | - " \"and full fine-tuning, revealing that LoRA's learned vectors diverge more \"\n", |
1007 | | - " 'from pre-trained weights.\\n'\n", |
1008 | | - " '- **Panel (c)** depicts cosine similarity distributions, highlighting that '\n", |
1009 | | - " 'regular vectors stay consistent while intruder dimensions show significant '\n", |
1010 | | - " 'deviation.')\n" |
1011 | | - ] |
1012 | | - } |
1013 | | - ], |
| 896 | + "metadata": {}, |
1014 | 897 | "source": [ |
1015 | | - "query = \"What the image from the Lora vs Full Fine-tuning paper tries to show? Be short.\"\n", |
1016 | | - "\n", |
1017 | | - "response = rag_pipeline.run(data={\"query\": query})[\"generator\"][\"replies\"][0].text\n", |
1018 | | - "print(response)" |
| 898 | + "*We'll be releasing a notebook soon to show how to implement the logic above using a Pipeline.*" |
1019 | 899 | ] |
1020 | 900 | }, |
1021 | 901 | { |
|
1046 | 926 | "\n", |
1047 | 927 | "from haystack.tools import tool\n", |
1048 | 928 | "from haystack.components.agents import Agent\n", |
| 929 | + "from haystack.components.generators.chat import OpenAIChatGenerator\n", |
1049 | 930 | "\n", |
1050 | | - "from haystack_experimental.dataclasses import ChatMessage, ImageContent\n", |
1051 | | - "from haystack_experimental.components.generators.chat import OpenAIChatGenerator\n", |
| 931 | + "from haystack.dataclasses import ChatMessage, ImageContent\n", |
1052 | 932 | "import python_weather\n", |
1053 | 933 | "\n", |
1054 | 934 | "# only needed in Jupyter notebooks where there is an event loop running\n", |
|
1197 | 1077 | "source": [ |
1198 | 1078 | "## What's next?\n", |
1199 | 1079 | "\n", |
1200 | | - "You can follow the progress of the Multimodal experiment in this [GitHub issue](https://github.com/deepset-ai/haystack/issues/8976).\n", |
| 1080 | + "We will release a notebook soon to show how to build more advanced multimodal pipelines, with a variety of different formats\n", |
| 1081 | + "and also using multimodal embedding models for retrieval.\n", |
1201 | 1082 | "\n", |
1202 | | - "In the future, you can expect support for more LLM providers, improvements to multimodal indexing and retrieval pipelines, plus the exploration of other interesting directions.\n", |
| 1083 | + "We will also extend multimodal features to more model providers.\n", |
1203 | 1084 | "\n", |
1204 | 1085 | "(*Notebook by [Stefano Fiorucci](https://github.com/anakin87)*)" |
1205 | 1086 | ] |
|
0 commit comments