Skip to content

Commit

Permalink
update: readme
Browse files Browse the repository at this point in the history
  • Loading branch information
soumik12345 committed Sep 27, 2024
1 parent e328158 commit aa4d5c5
Show file tree
Hide file tree
Showing 7 changed files with 2,223 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ cython_debug/
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

artifacts/
attachments/
index/
wandb/
18 changes: 17 additions & 1 deletion finance_multi_modal_rag/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Multi-modal RAG for Finance

A simple RAG (Retrieval-Augmented Generation) system for question-answering on [Tesla's](https://www.tesla.com/) financial filing gathered from the [SEC-EDGAR Database](https://www.sec.gov/edgar) by the [US Securities and Exchange Comission](https://www.sec.gov/). This codebase is closely inspired by the materials from the free course [RAG++ : From POC to Production](https://www.wandb.courses/courses/rag-in-production).

The codebase demonstrates how to build multi-modal RAG pipelines for question-answering systems and other downstream tasks in the finance domain using [Llama3.2 Vision](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision).

## Installation

Install dependencies using the following commands:
Expand All @@ -21,7 +25,19 @@ Finally, you need to get a Cohere API key (depending on which model you use).

## Usage

First, you need to fetch the 10-Q filings from [Edgar database](https://www.sec.gov/edgar) and generate image descriptions using [meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct).
First, you need to create a corpus which acts as a source of truth for the RAG system. We do this by fetching the `10-Q` and the `DEF 14A` filings from the [Edgar database](https://www.sec.gov/edgar). Besides collecting the text, we also fetch all the images associated with the filings. In order to make it easy for chunking and indexing, we generate a comprehensive description corresponding to each image and extract all the text and tabular information using [meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct).

We generate the image descriptions using a 2-stage prompting strategy:

First, we ask Llama-3.2 to summarize the text content in the filing document and extract some important keywords and observations.

![](./assets/trace_1.png)

Next, we pass this summary and set of extracted keyworkds from the text as context along with the images to Llama-90B-Vision-Instruct and ask it to generate descriptions of the image and extract all the text and tabular data in markdown format.

![](./assets/trace_3.png)

You can use the following code to create your corpus by fetching filings from the [Edgar database](https://www.sec.gov/edgar) and generating the image descriptions:

```python
import weave
Expand Down
Binary file added finance_multi_modal_rag/assets/trace_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added finance_multi_modal_rag/assets/trace_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added finance_multi_modal_rag/assets/trace_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 24 additions & 0 deletions finance_multi_modal_rag/test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import weave
from dotenv import load_dotenv

from finance_multi_modal_rag.llm_wrapper import MultiModalPredictor
from finance_multi_modal_rag.response_generation import FinanceQABot
from finance_multi_modal_rag.retrieval import BGERetriever

load_dotenv()

weave.init(project_name="finance_multi_modal_rag")
retriever = BGERetriever.from_wandb_artifact(
artifact_address="geekyrakshit/finance_multi_modal_rag/tsla-index:latest",
weave_chunked_dataset_address="TSLA_sec_filings_chunks:v1",
model_name="BAAI/bge-small-en-v1.5",
)
finace_qa_bot = FinanceQABot(
predictor=MultiModalPredictor(
model_name="meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo",
base_url="http://195.242.25.198:8032/v1",
),
retriever=retriever,
weave_corpus_dataset_address="TSLA_sec_filings:v8",
)
finace_qa_bot.predict(query="what did elon say in the tweets that tesla reported?")
Loading

0 comments on commit aa4d5c5

Please sign in to comment.