Featuring smart title resolution, episode handling, and multilingual media knowledge bases.
Polyvox is a fully local, zero-cost trilingual chatbot that acts as a TV & movie information assistant.
It detects whether a query is in Sinhala (සිංහල), Tamil (தமிழ்), or English — even when mixed or transliterated — and replies in the same language.
The system is built completely from scratch using open-source, offline-friendly libraries.
It now features:
- 🎞️ Title and episode recognition across multilingual movie/TV databases
- 💬 Trilingual answers about plot, year, genre, language, and episode details
- ⚙️ Rule-based + fuzzy entity resolution (with transliteration and substring matching)
- 🧠 Intent detection for media, FAQ, and conversational topics
- 🌐 Offline support with a Gradio chat interface that feels like a streaming assistant
- 🔤 Advanced Language Detection — Script + token-based engine for Sinhala, Tamil, and English with transliteration heuristics for Singlish and Tanglish queries.
- 🎥 TV & Movie Intelligence — Detects titles, years, genres, and episode numbers across multilingual metadata.
- 🧩 Modular Core Architecture — Individual modules for detection, entity resolution, intent routing, and multilingual answering.
- 💬 Tri-lingual Handlers — Generates natural replies in Sinhala, Tamil, or English, depending on query language.
- 📚 Expanded Media Knowledge Base — Includes localized metadata for Chernobyl, The Queen’s Gambit, Good Omens, The Haunting of Hill House, Adolescence, and more — all with Sinhala / Tamil translations.
- 🧠 Intent & Entity Recognition — Identifies whether the user wants general info, episode details, or conversational responses.
- 🌐 Offline-Friendly Chat UI — Gradio interface styled like a streaming-chat assistant.
- 🔮 ML-Ready Design — Hooks for FastText or local LLMs for richer semantic Q&A in future versions.
┌──────────────────────────┐
│ Web UI (Gradio) │
└──────────────┬───────────┘
│
┌─────────▼────────────────────────────┐
│ detection.py → primary language │
├─ classify_token() / detect_language()│
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ routing.py → route_to_intent() │
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ intents.py → detect_intent() │
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ entities.py → resolve_title() │
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ handlers.py → reply_[lang]() │
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ answers.py → build_response() │
└─────────┬────────────────────────────┘
│
┌─────────▼────────────────────────────┐
│ app.py → return to UI │
└──────────────────────────────────────┘
Prerequisite: Python 3.9 +
git clone https://github.com/yourusername/polyvox.git
cd polyvox
python -m venv .venv && source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -r requirements.txt
python app.pyGradio will show a local URL — open it and chat in Sinhala, Tamil, or English.
Optional CLI:
python scripts/run_cli.py| Language | Examples |
|---|---|
| Sinhala | ඔබට ස්තුතියි / අද කෙනෙද? / දැන් වේලාව කීයද? |
| Tamil | வணக்கம் / இன்று தேதி என்ன? / இப்போது நேரம் என்ன? |
| English | what’s today’s date? / hello / thanks |
| Code-Mix | Hi ආයුබෝවන් / vanakkam good morning |
The Polyvox Media Knowledge Base (movies_tv_kb.json) is a multilingual dataset of movies and TV series designed for trilingual (Sinhala • Tamil • English) media understanding.
It powers entity detection, metadata retrieval, and episode-aware answering for the chatbot.
- Movies (8 entries) and TV Series (6 entries)
- Each entry includes Sinhala, Tamil, and English localized fields:
- Titles, genres, and language labels
- Director(s), writer(s), and detailed cast lists
- Runtime, release dates, studios, and production info
- IMDb / Rotten Tomatoes / Metacritic ratings
- Awards in all three languages
- Short and long plot summaries (trilingual)
- For TV: per-season and per-episode details (title, air date, summary)
| Type | Title (EN) | Localized Titles | Year | Platform | Genre |
|---|---|---|---|---|---|
| Movie | Inception | ඉන්සෙප්ෂන් / இன்செப்ஷன் | 2010 | — | Sci-Fi / Action |
| Movie | Avatar | අවටාර් / அவதார் | 2009 | — | Sci-Fi / Fantasy |
| Movie | Machan | මචන් / மச்சான் | 2008 | — | Comedy / Drama |
| Movie | Aloko Udapadi | ඇලෝකෝ උදපාදි / அலோகோ உதபதி | 2017 | — | Historical / Epic |
| TV | Chernobyl | චර්නොබිල් / செர்னோபில் | 2019 | HBO | Drama / History |
| TV | The Queen’s Gambit | ද කුීන්ස් ගැම්බිට් / த க்வீன்ஸ் காம்பிட் | 2020 | Netflix | Drama |
| TV | Good Omens | ගුඩ් ඔමේන්ස් / கூட் ஓமேன்ஸ் | 2019–2023 | Prime Video | Fantasy / Comedy |
| TV | The Haunting of Hill House | ද හෝන්ටිං ඔෆ් හිල් හවුස් / தி ஹாண்டிங் ஆஃப் ஹில் ஹவுஸ் | 2018 | Netflix | Horror / Mystery |
| TV | Adolescence | — | 2025 | Netflix | Crime / Psychological |
- Tri-lingual metadata alignment for culturally accurate names and genre mappings.
- Episode-level granularity with multilingual summaries (for Chernobyl, Good Omens, Queen’s Gambit, etc.).
- Regional balance — includes international hits (Inception, Interstellar, La La Land) and South Asian cinema (Machan, Aloko Udapadi, Baahubali).
- Flexible JSON schema — easy to expand with new titles or media types.
- Designed for offline, multilingual entity resolution in Polyvox’s media-aware dialogue system.
| Goal | Where to Edit |
|---|---|
| Add new FAQs | data/faq_[lang].json |
| Add new TV titles | data/tv_series/ |
| Modify heuristics | core/detection.py / core/translit.py |
| Add new intents | core/intents.py |
| Add new responses | core/handlers.py |
| Adjust formatting/timezone | core/utils.py |
Default: Asia / Colombo (UTC +05:30) — controlled in core/utils.py.
pip install pyinstaller
pyinstaller --onefile app.pyCreates a fully offline single-file binary.
- Designed for offline, local trilingual interaction.
- Transliteration (Singlish / Tanglish) handled via heuristic matching.
- Safe demo dataset — culturally neutral and small for portability.
- Easily extendable toward ML or vector-retrieval backends.
Solo Project - Polyvox Core
Developed by Senyaka