Skip to content

Polyvox is a local, zero-cost trilingual chatbot for TV and movie info. It detects Sinhala, Tamil, or English - even in mixed or transliterated text - and replies in the same language.

Notifications You must be signed in to change notification settings

d-senyaka/polyvox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Polyvox - Multilingual TV & Movie Chatbot (Sinhala • Tamil • English)

Python License Platform Status Last Updated

Featuring smart title resolution, episode handling, and multilingual media knowledge bases.


🧭 Overview

Polyvox is a fully local, zero-cost trilingual chatbot that acts as a TV & movie information assistant.
It detects whether a query is in Sinhala (සිංහල), Tamil (தமிழ்), or English — even when mixed or transliterated — and replies in the same language.

The system is built completely from scratch using open-source, offline-friendly libraries.
It now features:

  • 🎞️ Title and episode recognition across multilingual movie/TV databases
  • 💬 Trilingual answers about plot, year, genre, language, and episode details
  • ⚙️ Rule-based + fuzzy entity resolution (with transliteration and substring matching)
  • 🧠 Intent detection for media, FAQ, and conversational topics
  • 🌐 Offline support with a Gradio chat interface that feels like a streaming assistant

⚙️ Core Features (v4)

  • 🔤 Advanced Language Detection — Script + token-based engine for Sinhala, Tamil, and English with transliteration heuristics for Singlish and Tanglish queries.
  • 🎥 TV & Movie Intelligence — Detects titles, years, genres, and episode numbers across multilingual metadata.
  • 🧩 Modular Core Architecture — Individual modules for detection, entity resolution, intent routing, and multilingual answering.
  • 💬 Tri-lingual Handlers — Generates natural replies in Sinhala, Tamil, or English, depending on query language.
  • 📚 Expanded Media Knowledge Base — Includes localized metadata for Chernobyl, The Queen’s Gambit, Good Omens, The Haunting of Hill House, Adolescence, and more — all with Sinhala / Tamil translations.
  • 🧠 Intent & Entity Recognition — Identifies whether the user wants general info, episode details, or conversational responses.
  • 🌐 Offline-Friendly Chat UI — Gradio interface styled like a streaming-chat assistant.
  • 🔮 ML-Ready Design — Hooks for FastText or local LLMs for richer semantic Q&A in future versions.

🏗️ Architecture

┌──────────────────────────┐
│        Web UI (Gradio)   │
└──────────────┬───────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ detection.py → primary language      │
     ├─ classify_token() / detect_language()│
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ routing.py → route_to_intent()       │
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ intents.py → detect_intent()         │
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ entities.py → resolve_title()        │
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ handlers.py → reply_[lang]()         │
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ answers.py → build_response()        │
     └─────────┬────────────────────────────┘
               │
     ┌─────────▼────────────────────────────┐
     │ app.py → return to UI                │
     └──────────────────────────────────────┘

🚀 Quickstart

Prerequisite: Python 3.9 +

git clone https://github.com/yourusername/polyvox.git
cd polyvox
python -m venv .venv && source .venv/bin/activate  # (Windows: .venv\Scripts\activate)
pip install -r requirements.txt
python app.py

Gradio will show a local URL — open it and chat in Sinhala, Tamil, or English.

Optional CLI:

python scripts/run_cli.py

🧪 Sample Queries

Language Examples
Sinhala ඔබට ස්තුතියි / අද කෙනෙද? / දැන් වේලාව කීයද?
Tamil வணக்கம் / இன்று தேதி என்ன? / இப்போது நேரம் என்ன?
English what’s today’s date? / hello / thanks
Code-Mix Hi ආයුබෝවන් / vanakkam good morning

📚 Knowledge Base

The Polyvox Media Knowledge Base (movies_tv_kb.json) is a multilingual dataset of movies and TV series designed for trilingual (Sinhala • Tamil • English) media understanding.
It powers entity detection, metadata retrieval, and episode-aware answering for the chatbot.


🧠 Structure Overview

  • Movies (8 entries) and TV Series (6 entries)
  • Each entry includes Sinhala, Tamil, and English localized fields:
    • Titles, genres, and language labels
    • Director(s), writer(s), and detailed cast lists
    • Runtime, release dates, studios, and production info
    • IMDb / Rotten Tomatoes / Metacritic ratings
    • Awards in all three languages
    • Short and long plot summaries (trilingual)
    • For TV: per-season and per-episode details (title, air date, summary)

🎥 Sample Coverage

Type Title (EN) Localized Titles Year Platform Genre
Movie Inception ඉන්සෙප්ෂන් / இன்செப்ஷன் 2010 Sci-Fi / Action
Movie Avatar අවටාර් / அவதார் 2009 Sci-Fi / Fantasy
Movie Machan මචන් / மச்சான் 2008 Comedy / Drama
Movie Aloko Udapadi ඇලෝකෝ උදපාදි / அலோகோ உதபதி 2017 Historical / Epic
TV Chernobyl චර්නොබිල් / செர்னோபில் 2019 HBO Drama / History
TV The Queen’s Gambit ද කුීන්ස් ගැම්බිට් / த க்வீன்ஸ் காம்பிட் 2020 Netflix Drama
TV Good Omens ගුඩ් ඔමේන්ස් / கூட் ஓமேன்ஸ் 2019–2023 Prime Video Fantasy / Comedy
TV The Haunting of Hill House ද හෝන්ටිං ඔෆ් හිල් හවුස් / தி ஹாண்டிங் ஆஃப் ஹில் ஹவுஸ் 2018 Netflix Horror / Mystery
TV Adolescence 2025 Netflix Crime / Psychological

🌍 Key Highlights

  • Tri-lingual metadata alignment for culturally accurate names and genre mappings.
  • Episode-level granularity with multilingual summaries (for Chernobyl, Good Omens, Queen’s Gambit, etc.).
  • Regional balance — includes international hits (Inception, Interstellar, La La Land) and South Asian cinema (Machan, Aloko Udapadi, Baahubali).
  • Flexible JSON schema — easy to expand with new titles or media types.
  • Designed for offline, multilingual entity resolution in Polyvox’s media-aware dialogue system.

🧩 Extend & Customize

Goal Where to Edit
Add new FAQs data/faq_[lang].json
Add new TV titles data/tv_series/
Modify heuristics core/detection.py / core/translit.py
Add new intents core/intents.py
Add new responses core/handlers.py
Adjust formatting/timezone core/utils.py

🕰 Timezone

Default: Asia / Colombo (UTC +05:30) — controlled in core/utils.py.


📦 Packaging

pip install pyinstaller
pyinstaller --onefile app.py

Creates a fully offline single-file binary.


⚠️ Notes

  • Designed for offline, local trilingual interaction.
  • Transliteration (Singlish / Tanglish) handled via heuristic matching.
  • Safe demo dataset — culturally neutral and small for portability.
  • Easily extendable toward ML or vector-retrieval backends.

🧑‍💻 Maintainers

Solo Project - Polyvox Core
Developed by Senyaka


🌟 Star this repo if you support local-language AI innovation!

About

Polyvox is a local, zero-cost trilingual chatbot for TV and movie info. It detects Sinhala, Tamil, or English - even in mixed or transliterated text - and replies in the same language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages