Retrieval-Augmented Generation (RAG) for radio regulations (e.g., ITU rules and spectrum management).
Index regulation PDFs with FAISS, retrieve the most relevant passages, and generate grounded answers with an LLM.
- Overview
- Features
- Quick Start
- Project Structure
- Usage
- Experiments
- Hugging Face (ZeroGPU)
- Figures
- Troubleshooting
- License
Radio-RAG implements a practical RAG pipeline tailored for telecom/spectrum regulations:
- Ingest regulation PDFs and split them into chunks.
- Embed those chunks and build a FAISS index.
- Retrieve the most relevant passages for a user question.
- Generate grounded answers with an LLM using the retrieved context.
- 🔎 PDF → chunks → FAISS: simple, configurable ingestion pipeline
- 🧠 Model-agnostic: choose your embedding and LLM backends
- ⚙️ Tunable retrieval: chunk size, overlap, index type, top-K
- 🧪 Experiment ready: compare vanilla LLM vs. RAG-augmented runs
git clone https://github.com/Zakaria010/Radio-RAG.git
cd Radio-RAG
# (optional) create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtCreate the data/ folder (if it doesn’t exist) and put your regulation PDFs inside:
data/
├─ itu_radio_regulations.pdf
└─ your_other_regulation_book.pdf
Radio-RAG/
├─ data/ # Put regulation PDFs here
├─ tests/ # Evaluation / experiment scripts
├─ utils/ # Helpers (parsing, chunking, indexing, retrieval)
├─ local_rag.py # CLI entry-point
├─ requirements.txt
├─ LICENSE
└─ README.md
Run the built-in help to see the exact flags supported by your current version.
python local_rag.py --helpA) Ask a question (builds or reuses the index)
python local_rag.py \
--pdf_folder ./data \
--question "What is the maximum power flux-density at the GSO produced by any EESS space station?"B) Retrieve context only (no generation) — if supported
python local_rag.py \
--pdf_folder ./data \
--top_k 5 \
--question "Define the protection criteria for GSO links." \
--no_generate--pdf_folder(str, default:./data) — directory of PDFs--chunk_size(int) — chunk length used for text splitting--overlap(int) — overlap between adjacent chunks--index_type(str) — FAISS index (flatl2,hnsw,ivfflat,ivfpq, …)--embed_model(str) — embedding model ID/name--llm_model(str) — LLM ID/name--top_k(int) — number of retrieved chunks--question(str) — your query--no_generate(flag) — return retrieved context without generation
If you change embedding/index parameters or models, rebuild the index to avoid stale vectors.
Evaluation utilities live in tests/. A typical pattern:
python tests/evaluate_rag.py \
--pdf_folder ./data \
--top_k 5Some versions include a switch like --norag to compare vanilla LLM vs RAG.
Run:
python tests/evaluate_rag.py --helpto see the exact options available in your copy.
Prefer a hosted demo? Try the app on Hugging Face Spaces (ZeroGPU spins up on demand):
▶️ Launch the Space:https://huggingface.co/spaces/zakinho00/RegRAGapp
Figure 5 — Vanilla vs. RAG (Qualitative)
- No/irrelevant answers → Confirm PDFs parse correctly; try larger
--top_k; adjust--chunk_size/--overlap - Index performance → Start with
flatl2(baseline) orhnsw(fast). IVF variants can help at larger scale - Changed models/params → Rebuild the index to avoid stale vectors
Released under the MIT License. See LICENSE for details.

