Note - The app is written using Python virtual environment version 3.12.10 and supports the latest Python versions. However, lower Python versions may offer more stability compared to the latest ones. Below setup will focus Windows system. And commands to setup may vary for macOS or Linux.
# If you downloaded the file, just navigate to folder
# Then press Shift (in Windows) + Right Mouse click
# Choose Open PowerShell window here option and move on to creating virtual environment
cd web_rag_scraper# Path to installation of particular version of python may vary
# I have installed more than one version of python in pyver directory
# Refer the resources section, for youtube video to install multiple versions of python
C:\Users\<username>\pyver\py3121\python -m venv ragenvragenv\Scripts\activatepip install -r requirements.txt
cd modelsdocker cp gemma-3n-E4B-it-Q4_K_M.gguf your_container_name:/root/.ollama/models/docker cp Modelfile your_container_name:/root/.ollama/modelsdocker exec -it your_container_name bashollama create gemma-3n -f /root/.ollama/models/Modelfile## No file download is needed for getting the model
## The model can be directly pulled from Ollama's model list
## Don't use this command
docker cp gemma-3n-E4B-it-Q4_K_M.gguf your_container_name:/root/.ollama/models/
## Don't use this command
docker cp Modelfile your_container_name:/root/.ollama/models
## Start from here
## Enter the docker container's bash
docker exec -it your_container_name bash
## Pull the model of your choice
ollama pull mistral
## Change the model name in config.py to "mistral" and chekout the results!This section explains how to run the application, the folder structure involved, and the different features available once the app is running.
Project Overview
-
modelsfolder contains LLM model and Modelfile for the web scraper.
-
main.pyThe main entry point of the application. Running this file initializes the web scraper, content parser, text processor and text generator.
-
loggerfolder keeps track of steps that execute on the interface or application.
Steps to run
python main.pyFeatures
-
Citations - Provides link to sources searched.
-
Best sources - Ranks or selects top websites which matches user intent.
-
LLM text generation - Uses pretrained LLM model for providing context-aware information.
-
Docker containers - Uses protected environment i.e. containers, to virtually or privately access information.
-
Compatible to open source - Can use open-source LLM models to generate text.
Special thanks to Vital-98 for helping create this project.
Python Version Setup
WSL and Linux
- https://learn.microsoft.com/en-us/windows/wsl/install
- (Ubuntu - Microsoft Store) https://apps.microsoft.com/detail/9pdxgncfsczv
Docker Documentation
Huggingface Hub
- (Our model gemma-3n-E4B-it-GGUF:Q4_K_M) https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF/tree/main

