CNN News RAG Assistant

CNN News RAG Assistant is a simple retrieval-augmented generation (RAG) desktop app built with Tkinter. It indexes cleaned CNN news articles, retrieves the most relevant chunks with FAISS, and asks an Ollama-served LLM to answer user questions with citations to the retrieved context.

Features

One-click GUI for submitting prompts and displaying answers.
Automatic FAISS index and chunk metadata creation when missing.
Shows the top retrieved article chunks used to craft each answer.
Configurable Ollama model name (defaults to llama3.2:3b).

Project structure

gui_app.py: Tkinter interface, initialization, and query handling.
rag_data_preparation.py: CSV ingestion, text chunking, embedding, and FAISS index creation.
query_engine.py: Vector search, prompt construction, and LLM call via Ollama.
requirements.txt: Python dependencies.
CNN_Articles_clean.csv: Cleaned CNN articles dataset (required for indexing).
articles_index.faiss / chunk_metadata.csv: Generated artifacts storing the vector index and chunk text.

Prerequisites

Python 3.10+
An Ollama installation with network access to http://localhost:11434.
The cleaned articles CSV named CNN_Articles_clean.csv in the project root. If you download it as CNN_Articels_clean.csv from Kaggle, rename it accordingly.

Installation

(Recommended) Create and activate a virtual environment.
Install dependencies:
```
pip install -r requirements.txt
```
Ensure Ollama is running and pull the default model:
```
ollama pull llama3.2:3b
```

Usage

Place CNN_Articles_clean.csv in the repository root.
Launch the GUI:
```
python gui_app.py
```
If articles_index.faiss or chunk_metadata.csv are missing, the app automatically:
- Splits articles into ~500-character chunks with 100-character overlap.
- Embeds chunks using the all-MiniLM-L6-v2 SentenceTransformer model (CPU or CUDA when available).
- Builds a FAISS L2 index and saves chunk metadata.
Enter a prompt in the text box and press Submit (or hit Enter). The app will show the model response followed by the retrieved chunks.

Configuration notes

To change the number of retrieved chunks, adjust the k argument in answer_query inside query_engine.py.
To use a different model, pass a model_name to answer_query or modify the default value.
The Ollama client endpoint can be customized in query_engine.py via Client(host=...).

Troubleshooting

If initialization stalls, confirm CNN_Articles_clean.csv exists and is readable.
Ensure Ollama is running locally and that the requested model is available.
Large embeddings generation may take several minutes on CPU; a GPU (CUDA) will accelerate the process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN News RAG Assistant

Features

Project structure

Prerequisites

Installation

Usage

Configuration notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
CNN_Articles_clean.csv		CNN_Articles_clean.csv
README.md		README.md
articles_index.faiss		articles_index.faiss
chunk_metadata.csv		chunk_metadata.csv
gui_app.py		gui_app.py
query_engine.py		query_engine.py
rag_data_preparation.py		rag_data_preparation.py
requirements.txt		requirements.txt

ptrypos/CNN-News-Rag-Assistant

Folders and files

Latest commit

History

Repository files navigation

CNN News RAG Assistant

Features

Project structure

Prerequisites

Installation

Usage

Configuration notes

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages