Creating a system to answer questions based on PDF content involves integrating various technologies for text processing, vector storage, and language model responses. This tutorial outlines the setup, code structure, and how RAG works integrated with Weaviate as a vector storage system.
-
Obtain Your API Keys: Log in to your Cerebras account, navigate to the “API Keys” section, and generate a new API key. Log in to your Weaviate account, create a new cluster, and take note of the API key and the cluster URL.
-
Set the API Keys in the Sidebar: Once you have both Cerebras and Weaviate API Keys and URL, add them to the sidebar on the left.
Let's make sure we have all of the requirements for this project installed!
pip install -r requirements.txt
Run the command streamlit run main.py
to start up the frontend.
Our code has four major components to make this system work.
-
Embeddings and Vector Storage
Embeddings are generated using LangChain's
SentenceTransformerEmbeddings
; theupload_vectors
function uploads text chunks to Weaviate after converting them into vectors.from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings # Create embeddings with st.spinner(text="Loading embeddings..."): embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_weaviate import WeaviateVectorStore # Upload vectors to Weaviate vector_store = WeaviateVectorStore(client=client, index_name="my_class", text_key="text", embedding=embeddings) for i in range(len(texts)): t = texts[i] vector_store.add_texts([t.page_content]) progress_bar.progress((i + 1) / len(texts), "Indexing PDF content... (this may take a bit) 🦙")
-
Generating a Response
Using the user's prompt and similar text chunks based on their search query, generate an answer.
# Perform similarity search docs = st.session_state.docsearch.similarity_search(prompt) # Load the question answering chain llm = ChatCerebras(api_key=CEREBRAS_API_KEY, model="llama3.1-8b") chain = load_qa_chain(llm, chain_type="stuff") # Query the documents and get the answer response = chain.run(input_documents=docs, question=prompt)
Retrieval-Augmented Generation (RAG) is a technique that combines the strengths of information retrieval and text generation models. Think about some exams you had to do earlier in your schooling: you couldn't answer all the questions without a background article provided by your teacher. That's what we're doing - providing background information to the LLM so it can answer questions about it.
We can create a Weaviate client using the user's API key and their cluster's URL.
# Create a Weaviate client
client = weaviate.connect_to_weaviate_cloud(
cluster_url=WEAVIATE_URL, # Replace with your Weaviate Cloud URL
auth_credentials=weaviate.AuthApiKey(WEAVIATE_API_KEY),
)
- Text Chunking: The PDF content is split into smaller chunks using
RecursiveCharacterTextSplitter
.
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader(temp_filepath)
data = loader.load()
# Split the data into smaller documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
- Vectorization: Each text chunk is converted into a vector representation using OllamaEmbeddings.
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings # Create embeddings with st.spinner(text="Loading embeddings..."): embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
- Weaviate: These vectors are then stored in Weaviate, which is an open source vector database. It's optimized for similarity search, which allows us to identify text chunks our LLM needs to answer the user's prompt.
from langchain_weaviate import WeaviateVectorStore
# Upload vectors to Weaviate
vector_store = WeaviateVectorStore(client=client, index_name="my_class", text_key="text", embedding=embeddings)
for i in range(len(texts)):
t = texts[i]
vector_store.add_texts([t.page_content])
progress_bar.progress((i + 1) / len(texts), "Indexing PDF content... (this may take a bit) 🦙")
- Similarity Search: Weaviate performs a similarity search to retrieve the most relevant text chunks based on the user's query.
# Perform similarity search
docs = st.session_state.docsearch.similarity_search(prompt)
- Answer Generation: Based on the most relevant text chunks found from the similarity search and the user's query, return an answer generated by Cerebras's output.
from langchain.chains.question_answering import load_qa_chain
# Load question answering chain
llm = ChatCerebras(api_key=CEREBRAS_API_KEY, model="llama3.1-8b")
chain = load_qa_chain(llm, chain_type="stuff")
# Generate answer
response = chain.run(input_documents=docs, question=prompt)