This repository expands on the original rag-tutorial-v2 to add more features and improve the performance of the chatbot built with Langchain and a vector database. Below is a guide to set up the environment and run the application.
Over a year ago, I attempted to create a chatbot using Langchain and a vector database. Although I could import files and parse them, I faced challenges getting useful information from the vector DB. Fast forward to today, and many repositories offer working implementations for chatting with documents.
I initially forked and tested a repo Web-LLM-Assistant-Llama-cpp-working but faced issues with importing larger files (greater than 200KB). After troubleshooting, I found rag-tutorial-v2, which worked well for smaller documents and served as a base for my updates.
I expanded the project with additional features that I wanted to see. Here's a video demo of the app in action.
This project is set up to run on a Windows 10 machine. Follow the instructions below to recreate the environment on your machine.
First, create a directory to store your repositories and clone the project.
# Navigate to the root directory where your repos will be stored
cd \
mkdir gitrepos
cd gitrepos
# Clone the repository
git clone https://github.com/MartinTorres2099/rag-tutorial-v2-updated.git
The repository will be cloned to:
C:\gitrepos\rag-tutorial-v2-updated
Navigate to the project directory and create a Python virtual environment:
cd C:\gitrepos\rag-tutorial-v2-updated
python -m venv venv # Run only once to create your virtual environment
To activate the virtual environment on Windows:
venv\Scripts\activate.bat
Install the required dependencies by running:
pip install -r requirements.txt
Additionally, install Flask and Langchain:
pip install Flask
pip install langchain-community
Once you are done working, deactivate the virtual environment with:
venv\Scripts\deactivate.bat
Update get_embedding_function.py
to run locally by uncommenting and adjusting the code:
from langchain_community.embeddings.ollama import OllamaEmbeddings
from langchain_community.embeddings.bedrock import BedrockEmbeddings
def get_embedding_function():
# Uncomment and configure to use cloud-based embeddings
# embeddings = BedrockEmbeddings(credentials_profile_name="default", region_name="us-east-1")
embeddings = OllamaEmbeddings(model="nomic-embed-text")
return embeddings
You can load different document types using Langchain's document loaders. Find more details here:
To install Ollama, follow the instructions on their official website.
Ensure that the Ollama application is running on your machine before starting the app. You can create a batch file to automate the process.
Create a batch file (start_app.bat
) with the following content:
@echo off
echo This will launch the RAG application
timeout /t 2
echo Changing to rag directory
cd C:\rag-tutorial-v2
timeout /t 2
echo Activating Python virtual environment
call venv\Scripts\activate.bat
timeout /t 2
python app.py
echo Waiting for the app to close...
timeout /t 2
echo Deactivating Python virtual environment
call venv\Scripts\deactivate.bat
timeout /t 2
echo Thank you for using the RAG application, goodbye for now!
timeout /t 2
exit
Execute the batch file to start the application. The app will run a local development server, accessible at:
http://127.0.0.1:5000/
You can also run the program manually without the web interface:
- Pull the Nomic embed model:
ollama pull nomic-embed-text
- Download and use a Mistral model:
ollama pull mistral-nemo # Pull the Mistral model
- Serve the model:
ollama serve # (Verify if needed)
- Run the model:
ollama run mistral-nemo
- Exit the model:
/bye
- Add or update documents in the database:
python populate_database.py
- Test the RAG system with known data:
Run the following to check how well the LLM answers questions based on the vector DB:
pytest -s # Modify test_rag.py with known data
To uninstall the virtual environment, first deactivate it:
venv\Scripts\deactivate.bat
Then, delete the virtual environment:
rm -r venv
In working with Flask, it would always run with development mode on regardless of how I was launching the application or how I changed variables to run in something other than debug. I installed Waitress and updated app.py to use that instead:
In the virtual environment, install Waitress:
pip install waitress
Update the app.py code:
if __name__ == "__main__":
print("Starting Flask app with Waitress...")
serve(app, host='0.0.0.0', port=5000)
Run the Flask app with Waitress:
python app.py
The code has been updated to allow you to ask a question and once it is answered, you have the option to ask another question with the loaded model or go back to the main index.html and choose a different model to ask a question under. The previous code always asked what model you wanted to use.
The app has now been updated so that the machine it is running under can be accessed from any other machines in the environment by inpoutting the IP address of the machine it is running under. I changed the port to 8080 and I am using Waitress to serve up the site. The next step is to ask if an answer cannot be found in the documents, should the app search online for an answer.
Thank you for using this project! Feel free to contribute or make improvements.