Skip to content

πŸ“„ Lightweight Q&A chatbot for PDF, DOCX, and TXT files. Built with Streamlit, LangChain, HuggingFace, and ChromaDB β€” no large LLMs required.

Notifications You must be signed in to change notification settings

HenryMorganDibie/docqa-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ DocQA Chatbot

An interactive Streamlit-powered chatbot that answers questions from uploaded documents using local language models and vector embeddings. It supports PDF, DOCX, and TXT files.

πŸš€ Features

  • πŸ“ Upload and parse PDF, DOCX, or TXT files.
  • βœ‚οΈ Split large documents into smaller, searchable chunks.
  • πŸ” Embed content using HuggingFace models and store it in Chroma vector DB.
  • πŸ’¬ Ask questions and get relevant answers along with source references.
  • 🧠 Powered by LangChain, HuggingFace, and Sentence-Transformers.

πŸ› οΈ Stack

  • Python 3.11+
  • Streamlit
  • LangChain
  • Sentence-Transformers
  • HuggingFace Embeddings
  • Chroma DB
  • PyPDF / docx2txt

πŸ§ͺ Example Usage

Upload a .pdf, .docx, or .txt file using the uploader. Then ask a question like:

What are the key findings in the document?

Summarize the second section.

Who is the author or target audience?

The app will process the document, chunk it, embed it using all-MiniLM-L6-v2, store embeddings in Chroma, and return answers using a local question-answering chain.

πŸ“Œ Notes

This chatbot uses a small but powerful model (all-MiniLM-L6-v2) to allow fast, offline use without needing a GPU or large downloads. Ideal for low-resource environments or quick prototypes. For future upgrades, you can replace the embedding model or chain logic with more advanced components. If you're getting LangChain deprecation warnings, upgrade to the latest imports (e.g., langchain-huggingface, langchain-community). No internet is required for basic QA after models are cached. Keep your uploaded files in the data/uploads folder.

πŸ“¦ Setup Instructions

git clone https://github.com/your-username/docqa-chatbot.git
cd docqa-chatbot
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
streamlit run main.py

Then open http://localhost:8501 in your browser.


πŸ“‚ Project Structure

docqa-chatbot/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ ingest.py
β”‚   └── qa_engine.py
β”œβ”€β”€ interface/
β”‚   └── streamlit_app.py
β”œβ”€β”€ main.py
β”œβ”€β”€ data/
β”‚   └── uploads/
β”œβ”€β”€ requirements.txt
└── README.md

About

πŸ“„ Lightweight Q&A chatbot for PDF, DOCX, and TXT files. Built with Streamlit, LangChain, HuggingFace, and ChromaDB β€” no large LLMs required.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages