Multi-Modal Retrieval System

This project is a demonstration of a Multi-Modal Retrieval System, where documents of various modalities like image, text, video, image+text, can be retrieved using text in natural language. It can be used in corporate intranet document lookup, cloud storage search, or even enhancing Google search by going beyond simple text matching.

Component	Description	GitHub Repo
Event Core	Common code for: Domain models Event schemas API clients for: Storage Service Embedding Service Meta Service	https://github.com/axwhyzee/multi-modal-retrieval-event-core
Gateway Service	API gateway Coordinate calls to various services to aggregate a response	https://github.com/axwhyzee/multi-modal-retrieval-gateway-service
Storage Service	Remote object repository using AWS S3 buckets	https://github.com/axwhyzee/multi-modal-retrieval-storage-service
Embedding Service	On ChunkStored events, index the chunk using Pinecone Given a query, fetch most relevant objects in 2-stage retrieval	https://github.com/axwhyzee/multi-modal-retrieval-embedding-service
Preprocessor Service	On DocStored events, chunk document, carry out text and image preprocessing on chunks, and generate thumbnails	https://github.com/axwhyzee/multi-modal-retrieval-preprocessor-service
Meta Service	Holds mapping of objects to their meta data	N/A (using Redis server)
Frontend	GUI	https://github.com/axwhyzee/multi-modal-retrieval-frontend

This repo orchestrates the system on a single machine with containerized services. To run a distributed version, each service (each git repo) can run on its own box.

1. Demo

1.1 Sony WH-1000XM4 Help Guide

Use case: Customer support
Dataset:

PDFs scraped from Sony WH-1000XM4 Help Guide
Videos from Sony Europe Youtube Channel

demo_sony_150_speed.mp4

1.2 Pinecone Python Client Codebase

Use case: Internal technical documentation search
Dataset:

Pinecone GitHub Code Base

demo_pinecone_150_speed.mp4

2. System

2.1 System Overview

2.2 Detailed Architecture

The hybrid architecture can be divided into the write and read paths, which are event-driven and request-response based respectively.

2.2.1 Write Path

The write path is designed to be event-driven because processing bottlenecks like chunking and embedding can be called asynchronously; all steps within the write path are idempotent; and eventual consistency is sufficient.

2.2.1.1 Write Path: Document Upload

When a document is uploaded to the Gateway Service (API gateway), the Gateway Service stores the document into the Storage Service, which emits a DocStored event.

2.2.1.2 Write Path: Pre-processing

DocStored events are received by the Preprocessor Service, which extracts elements like images, texts, plots and code blocks are from the document. Assets like document thumbnails and element thumbnails are also generated. All these objects are stored in the Storage Service, and meta data is inserted into the Meta Service where applicable. When element objects are inserted into the Storage Service, ElementStored events are emitted.

2.2.1.3 Write Path: Indexing

ElementStored events are received by the Embedding Service, which embeds the elements using the corresponding embedding models. The embeddings are indexed in the corresponding {ELEMENT_TYPE}/{USER} namespace in vector database Pinecone which allows multi-tenancy.

2.2.2 Read Path

The read path is synchronous because it must respond back to the user ASAP. Hence, it follows a simple and traditional request-response design.

2.2.2.1 Read Path: Query

User sends a text query to the Gateway Service, which forwards the request to the Embedding Service.

2.2.2.2 Read Path: Retrieval

For each element type, Embedding Service embeds the text query using the corresponding embedding model. The text embedding is used to query in the namespace corresponding to the element type and user, fetching top-k elements most similar to the text query. The top-k elements are reranked by the element-specific rerankers, and only the top-n ranked elements are returned as response.

Note: top_k = top_n * MULTIPLIER, where MULTIPLIER is an int > 1

2.2.2.3 Read Path: Transform & Aggregate

On receiving the results from the Embedding Service, the Gateway Service transforms the response and fetches corresponding asset and element meta data from the Meta Service.

Setup

Create a .env file with the following env vars:

AWS_S3_BUCKET_ACCESS_KEY=...
AWS_S3_BUCKET_NAME=...
AWS_S3_BUCKET_REGION=...
AWS_S3_BUCKET_SECRET_ACCESS_KEY=...
EMBEDDING_SERVICE_API_URL=http://embedding_service_api-1:5000/  # use generated name of docker container
ENV=DEV                                                         # use local file system instead of S3 for object storage
PINECONE_API_KEY=...
REACT_APP_API_URL=http://localhost:5001                         # URL to Gateway Service, has port forwarding 5001:5000 by default (configure in `docker-compose.yml`)
REACT_APP_USER=...
REDIS_HOST=...
REDIS_PASSWORD=...
REDIS_PORT=...
REDIS_USERNAME=...
STORAGE_SERVICE_API_URL=http://storage_service_api-1:5000/      # use generated name of docker container

Install Docker
Increase Docker memory limit to at least 12GB
Run source run.sh to clone the services + build and/or start the docker containers
Insert dummy data by running python insert.py
Go to http://localhost:3000 to access the web-based GUI

Workers

To scale up a particular service like embedding_service_event_consumer, change the docker command in run.sh as shown

docker-compose up -d --scale embedding_service_event_consumer=3

Future Works

This system is designed such that it is not required for documents of all modalities live in the same embedding space. This means that new modalities can be introduced, as long as there exists a dual-modal text-<NEW MODAL> model. For instance, the audio modality can be introduced as long as there exists a suitable text-audio embedding model and reranker. This also means that custom document formats like proprietary ones, can make use of this search system as well.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
README.md		README.md
docker-compose.yml		docker-compose.yml
insert.py		insert.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly