Skip to content

ptrypos/CNN-News-Rag-Assistant

Repository files navigation

CNN News RAG Assistant

CNN News RAG Assistant is a simple retrieval-augmented generation (RAG) desktop app built with Tkinter. It indexes cleaned CNN news articles, retrieves the most relevant chunks with FAISS, and asks an Ollama-served LLM to answer user questions with citations to the retrieved context.

Features

  • One-click GUI for submitting prompts and displaying answers.
  • Automatic FAISS index and chunk metadata creation when missing.
  • Shows the top retrieved article chunks used to craft each answer.
  • Configurable Ollama model name (defaults to llama3.2:3b).

Project structure

  • gui_app.py: Tkinter interface, initialization, and query handling.
  • rag_data_preparation.py: CSV ingestion, text chunking, embedding, and FAISS index creation.
  • query_engine.py: Vector search, prompt construction, and LLM call via Ollama.
  • requirements.txt: Python dependencies.
  • CNN_Articles_clean.csv: Cleaned CNN articles dataset (required for indexing).
  • articles_index.faiss / chunk_metadata.csv: Generated artifacts storing the vector index and chunk text.

Prerequisites

  • Python 3.10+
  • An Ollama installation with network access to http://localhost:11434.
  • The cleaned articles CSV named CNN_Articles_clean.csv in the project root. If you download it as CNN_Articels_clean.csv from Kaggle, rename it accordingly.

Installation

  1. (Recommended) Create and activate a virtual environment.

  2. Install dependencies:

    pip install -r requirements.txt
  3. Ensure Ollama is running and pull the default model:

    ollama pull llama3.2:3b

Usage

  1. Place CNN_Articles_clean.csv in the repository root.

  2. Launch the GUI:

    python gui_app.py
  3. If articles_index.faiss or chunk_metadata.csv are missing, the app automatically:

    • Splits articles into ~500-character chunks with 100-character overlap.
    • Embeds chunks using the all-MiniLM-L6-v2 SentenceTransformer model (CPU or CUDA when available).
    • Builds a FAISS L2 index and saves chunk metadata.
  4. Enter a prompt in the text box and press Submit (or hit Enter). The app will show the model response followed by the retrieved chunks.

Configuration notes

  • To change the number of retrieved chunks, adjust the k argument in answer_query inside query_engine.py.
  • To use a different model, pass a model_name to answer_query or modify the default value.
  • The Ollama client endpoint can be customized in query_engine.py via Client(host=...).

Troubleshooting

  • If initialization stalls, confirm CNN_Articles_clean.csv exists and is readable.
  • Ensure Ollama is running locally and that the requested model is available.
  • Large embeddings generation may take several minutes on CPU; a GPU (CUDA) will accelerate the process.

About

CNN news rag assistant based on CNN articles dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages