Skip to content

Omcodesk/AURA-AI-Voice-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
AURA Project Banner

๐ŸŽ™๏ธ AURA โ€” AI-Powered Voice Assistant & Automation Platform

โœจ Features โ€ข ๐Ÿ“ Architecture โ€ข โš™๏ธ How It Works โ€ข ๐Ÿ› ๏ธ Tech Stack โ€ข ๐Ÿš€ Installation โ€ข ๐Ÿ’ก Usage โ€ข ๐Ÿ“‚ Structure โ€ข ๐Ÿ’ฌ Commands


AURA is a production-grade, local desktop AI voice assistant and automation platform built entirely in Python. It enables hands-free control of your Windows PC through natural language voice commands โ€” with zero manual interaction required after startup. โšก

The system features a biometric face-authentication gateway ๐Ÿ”, a dual-layer NLU engine ๐Ÿง  (deterministic rules + LLM fallback), cloud-accelerated speech-to-text โ˜๏ธ via Groq Whisper, and a native Windows SAPI5 TTS engine ๐Ÿ—ฃ๏ธ โ€” all tied together through a custom-built event-driven architecture using a publish/subscribe event bus and a validated state machine.

๐Ÿ’ก Built for Scale: Designed as a modular, production-ready system with clearly separated concerns: audio pipeline, authentication, natural language understanding, action dispatching, and GUI โ€” each operating independently via events.


๐ŸŒŸ Key Features

๐Ÿ”’ Biometric Face Authentication Gate

  • ๐Ÿ›ก๏ธ On launch, AURA starts silently in the LOCKED state โ€” microphone is active, but the UI is hidden.
  • ๐Ÿ‘๏ธ When the wake phrase is detected, the GUI surfaces and OpenCV's YuNet face detector (ONNX, ~200 KB) scans the camera.
  • ๐Ÿงฌ A 128-dimensional SFace embedding is extracted from the detected face and compared via cosine similarity against stored enrollment embeddings.
  • ๐ŸŽฏ Threshold: score โ‰ฅ 0.75 confidence to accept detection; recognition similarity must pass a tuned threshold.
  • ๐Ÿ‘ฅ Supports multi-user enrollment stored in local SQLite (aura.db). Includes automatic migration from legacy databases.
  • ๐Ÿ› ๏ธ Dev bypass via --bypass-auth flag for rapid iteration.

๐ŸŽ™๏ธ Always-On Wake Word Detection

  • ๐ŸŽง A single MicStream thread captures 16 kHz mono PCM audio in 30 ms frames (480 samples) continuously.
  • ๐ŸŒŠ WebRTC VAD (webrtcvad, aggressiveness level 2) processes each frame to detect speech onset and offset.
  • ๐Ÿ”„ A pre-speech ring buffer (5 frames = 150 ms) is prepended to every captured utterance to avoid clipping the first syllable.
  • ๐Ÿš€ A complete utterance is sent to Groq Whisper (whisper-large-v3-turbo) for transcription, then matched against the configured wake phrase ("take control").
  • โœจ No separate wake-word model is needed โ€” Whisper handles both wake detection and command transcription.

๐Ÿง  Dual-Layer Universal Intent Engine

The NLU pipeline uses a two-stage classification approach for reliability + intelligence:

Stage 1 โ€” Fast Rule Matcher (Deterministic) โšก

  • FastRuleMatcher scans normalized text against curated keyword/synonym dictionaries (synonym_map.json, app_mappings.json, site_mappings.json).
  • Returns intent with confidence = 1.0 instantly โ€” no API call, no latency.
  • Handles the majority of everyday commands (open/close apps, search, time, weather, media, screenshot, system control).

Stage 2 โ€” LLM Intent Brain (Groq LLaMA Fallback) ๐Ÿค–

  • If Stage 1 returns no match, text is sent to llama-3.1-8b-instant via Groq API with a structured system prompt.
  • Returns a JSON object: {intent, action, slots, confidence, needs_clarification, requires_confirmation}.
  • Temperature is set to 0.0 for deterministic structured outputs (response_format: json_object).

Post-Processing Pipeline โš™๏ธ:

  • Context Resolution โ€” pronouns like "it", "this", "that" are resolved against the ContextMemory (last entity mentioned).
  • Canonical Cross-Check โ€” if open_app targets a known website, intent is automatically pivoted to open_website.
  • Confidence Guard โ€” low-confidence results are escalated to conversation fallback.
  • Single-Intent Processing โ€” only the highest-confidence intent per utterance is executed (prevents accidental chained actions).

๐Ÿ—ฃ๏ธ Multi-Threaded Native TTS Engine

  • ๐Ÿ”Š TTSThread runs as a dedicated QThread with an internal queue.Queue for thread-safe speech requests.
  • ๐Ÿ’ป Uses Windows SAPI5 (SAPI.SpVoice) via comtypes โ€” no model downloads, no latency overhead.
  • โšก Speaks asynchronously (flag 1) while WaitUntilDone(100ms) polls in a tight worker loop.
  • ๐Ÿ›‘ Immediate interrupt capability: stop_speaking() calls Speak("", 2) (SVSFPurgeBeforeSpeak flag) to halt mid-sentence.
  • ๐Ÿง‘ Automatically selects a male voice (searches for "david", "mark", "james", "george" in installed voices).
  • ๐Ÿงฉ COM is initialized per-thread (pythoncom.CoInitialize()) and cleaned up on shutdown.
  • ๐Ÿ”‡ Emits speech_started and speech_ended Qt signals โ€” main window mutes the microphone during speech to eliminate feedback loops.

๐Ÿ“Š Event-Driven State Machine

AURA uses a thread-safe validated state machine with 7 states:

๐Ÿ”’ LOCKED โ†’ ๐Ÿ’ค IDLE โ†’ ๐Ÿ‘‚ LISTENING โ†’ ๐Ÿค” THINKING โ†’ โš™๏ธ EXECUTING โ†’ ๐Ÿ—ฃ๏ธ SPEAKING โ†’ ๐Ÿ‘‚ LISTENING
  • โœ… All transitions are validated against an allowed-transitions table โ€” illegal transitions are logged and blocked.
  • ๐ŸŒ Every state change publishes a state.changed event on the EventBus, which updates the GUI, status bar, and orb visualizer in real time.
  • ๐Ÿ”’ The StateMachine is a thread-safe singleton using a threading.Lock.

๐Ÿ“ก Thread-Safe Publish/Subscribe Event Bus

  • ๐ŸšŒ EventBus is a singleton pub/sub bus backed by a PySide6 QObject with a Signal(str, object).
  • ๐Ÿ”„ publish() is callable from any thread โ€” it emits a Qt signal, ensuring callbacks always run on the Qt main thread (thread-safe UI updates).
  • ๐Ÿ“Œ 20+ named event types: auth.success, wake.detected, intent.classified, tts.start, state.changed, system.shutdown, etc.

๐Ÿ›ก๏ธ Safety & Confirmation System

  • โš ๏ธ Dangerous commands (shutdown, restart, lock, format, delete) are tagged requires_confirmation = True by both the rule matcher and LLM.
  • ๐Ÿ›‘ A visual confirmation dialog (PySide6) appears simultaneously with a verbal prompt.
  • โœ… User can confirm via voice ("yes", "confirm", "proceed") or cancel ("no", "cancel", "stop") within a 6-second timeout window.
  • โณ ConfirmationService holds the pending ParsedCommand and resolves it on voice/UI response.
  • ๐Ÿ” SessionGuard enforces role-based access control โ€” certain actions require fresh biometric re-authentication.

๐ŸŒ Action Dispatch System

All commands are dispatched through a central ActionDispatcher that routes ParsedCommand objects to registered handlers via ActionRegistry:

Intent Actions
๐Ÿ–ฅ๏ธ app_control Open/close any desktop application via subprocess + psutil
๐ŸŒ browser_control Default browser launch, Google/YouTube/website search
โš™๏ธ system_control Shutdown, restart, lock, sleep, volume, brightness
๐ŸŒค๏ธ weather Real-time weather via Open-Meteo API (geocoding + WMO codes)
โฐ time Local time/date with formatted spoken response
๐Ÿ’ฌ whatsapp Open WhatsApp Web chats, compose message drafts
๐Ÿ“ง email Gmail compose URL with pre-filled subject/body
๐Ÿ“ธ screenshot Capture screen via Pillow, save to data/screenshots/
๐ŸŽต media_control Play/pause/next/prev via pyautogui media keys
๐Ÿ“… reminders Schedule reminders/alarms, persist to SQLite, poll every 30s
๐Ÿค– conversation Free-form chat via Groq llama-3.1-8b-instant with 10-turn history

๐Ÿ’พ Conversational Memory & Persistent Storage

  • ๐Ÿ—„๏ธ All user turns and AURA responses are logged to SQLite (aura.db) via SQLModel ORM.
  • ๐Ÿ“œ Memory window shows full conversation history, sortable and browsable.
  • โฐ Scheduled reminders and alarms are stored in the reminder table and polled every 30 seconds via a QTimer.
  • ๐Ÿ”— Context memory resolves pronouns across turns ("open spotify" โ†’ "close it").

๐Ÿ“ System Architecture

AURA uses an event-driven, pipe-and-filter architecture with strict separation of concerns. Each subsystem communicates exclusively through the EventBus or Qt signals โ€” no direct cross-module calls at runtime.

graph TD
    User([๐ŸŽค User Voice]) -->|16kHz PCM| Mic[MicStream\naudio.mic_stream]
    Mic -->|30ms frames| Queue[Audio Queue\nmaxsize=300]
    Queue -->|frames| VAD[VadManager\nWebRTC VAD]
    VAD -->|utterance bytes| Router{State Router\nmain_window}

    subgraph Audio Pipeline
        Router -->|IDLE/LOCKED| Wake[WakeDetector\naudio.wake_listener]
        Router -->|LISTENING| CMD[Command Processor\nmain_window]
    end

    Wake -->|Groq Whisper| WakeCheck{Wake Phrase\nMatch?}
    WakeCheck -->|Yes - LOCKED| Auth[Face Auth\nauth.face_auth]
    WakeCheck -->|Yes - IDLE| Listen[State: LISTENING]
    Auth -->|Pass| Console[Console UI]

    CMD -->|Groq Whisper| STT[WhisperSTT\nspeech.whisper_stt]
    STT -->|transcript| Validator[TranscriptValidator]
    Validator -->|valid| Engine[IntentEngine\nbrain.core.intent_engine]

    subgraph NLU Pipeline
        Engine -->|normalize| Normalizer[CommandNormalizer]
        Normalizer -->|clean text| Fast[FastRuleMatcher\nStage 1]
        Fast -->|no match| LLM[LLMIntentBrain\nGroq LLaMA 3.1]
        Fast -->|match| Slots[SlotExtractor]
        LLM -->|JSON| Slots
        Slots -->|enriched cmd| Guard[ConfidenceGuard]
        Guard -->|ParsedCommand| Context[ContextMemory]
    end

    Context -->|resolved cmd| Policy[SessionGuard\nservices.session_guard]
    Policy -->|allowed| Confirm{Requires\nConfirmation?}
    Confirm -->|Yes| ConfirmSvc[ConfirmationService]
    Confirm -->|No| Dispatch[ActionDispatcher\nservices.action_dispatcher]
    ConfirmSvc -->|resolved| Dispatch

    Dispatch -->|result| TTS[TTSThread\nSAPI5 SpVoice]
    Dispatch -->|result| Memory[MemoryManager\nSQLite aura.db]
    TTS -->|speech_started| MicMute[Mic Muted\nduring speech]
    TTS -->|speech_ended| MicUnmute[Mic Unmuted\nresume listening]
Loading

โš™๏ธ How It Works โ€” Step by Step

1. ๐Ÿš€ Startup (Silent Background Mode)

app.py โ†’ MainWindow.__init__() โ†’ _start_pipeline()
  • TTS thread starts and warms up SAPI5 COM object.
  • MicStream begins capturing 30 ms PCM frames into a bounded queue (maxsize=300).
  • A dedicated vad-consumer daemon thread pulls frames from the queue and feeds VadManager.
  • App window stays hidden โ€” system tray / background only.

2. ๐Ÿ‘‚ Wake Word Detection

VadManager detects speech โ†’ _on_utterance_captured(audio) โ†’ _check_wake(audio) [new Thread]
  • WakeDetector sends the audio buffer to Groq Whisper (whisper-large-v3-turbo).
  • PCM bytes are wrapped into a WAV container in-memory (io.BytesIO + wave) before upload.
  • Transcript is checked against the configured wake phrase (default: "take control").

3. ๐Ÿ›ก๏ธ Face Authentication

Wake detected in LOCKED โ†’ QMetaObject.invokeMethod(_start_face_auth) โ†’ AuthWindow
  • Window surfaces, camera activates via OpenCV.
  • Per-frame: YuNet.detect() โ†’ SFace.alignCrop() โ†’ SFace.feature() โ†’ cosine_similarity().
  • Successful match emits auth_success signal โ†’ _on_auth_proceed(username).

4. ๐Ÿง  Command Processing

State: LISTENING โ†’ utterance captured โ†’ _process_command(audio) [new Thread]
State โ†’ THINKING โ†’ EXECUTING โ†’ SPEAKING
  1. Audio โ†’ WhisperSTT.transcribe() โ†’ raw transcript
  2. TranscriptValidator rejects noise/short/repeated text
  3. IntentEngine.process() โ†’ normalize โ†’ fast match OR LLM โ†’ extract slots โ†’ resolve context
  4. SessionGuard.verify_access() โ†’ check permissions
  5. ConfirmationService โ†’ if dangerous, pause and ask
  6. ActionDispatcher.dispatch() โ†’ find handler in ActionRegistry โ†’ execute
  7. Response text โ†’ TTSThread.speak() + _signals.aura_response.emit() (GUI transcript)

5. ๐Ÿ—ฃ๏ธ Speech Output & Loop

TTS: speech_started โ†’ mic muted โ†’ speech_ended โ†’ mic unmuted โ†’ State: LISTENING
  • After every response, the active window timer (300s) resets.
  • If the timer expires with no further commands, state returns to IDLE.

๐Ÿ› ๏ธ Technology Stack

Layer Component Version / Details
Language ๐Ÿ Python 3.12+
GUI Framework ๐Ÿ–ผ๏ธ PySide6 (Qt for Python) Dark Fusion theme, QStackedWidget, custom Orb visualizer
STT Engine ๐ŸŽ™๏ธ Groq Whisper API whisper-large-v3-turbo โ€” cloud-accelerated transcription
LLM / NLU ๐Ÿง  Groq LLaMA llama-3.1-8b-instant โ€” structured JSON intent classification
TTS Engine ๐Ÿ”Š Windows SAPI5 via comtypes SAPI.SpVoice, async + interruptible, male voice selection
Face Detection ๐Ÿ‘๏ธ OpenCV YuNet ONNX face_detection_yunet_2023mar.onnx (~200 KB)
Face Recognition ๐Ÿงฌ OpenCV SFace ONNX face_recognition_sface_2021dec.onnx (~37 MB), 128-d embeddings
VAD ๐ŸŒŠ WebRTC VAD (webrtcvad) 30 ms frames @ 16 kHz, aggressiveness level 2
Audio Capture ๐ŸŽค PyAudio 16 kHz, mono, 480-sample chunks
Storage / ORM ๐Ÿ—„๏ธ SQLite + SQLModel Local aura.db โ€” conversations, users, reminders
Weather API ๐ŸŒค๏ธ Open-Meteo (free, no key) Geocoding + WMO weather codes
HTTP Client ๐ŸŒ httpx Async-capable, used for weather API
Logging ๐Ÿ“ Loguru Rotating file logs + Qt signal bridge for GUI display
COM Interop ๐Ÿ”Œ comtypes + pythoncom Windows SAPI5 SpVoice per-thread COM initialization

๐Ÿš€ Installation

๐Ÿ“‹ Prerequisites

  1. ๐Ÿ Python 3.12+ installed on Windows.
  2. ๐Ÿ”‘ A valid Groq API Key โ€” free at console.groq.com.
  3. ๐ŸŽค A working microphone and ๐Ÿ“ท webcam (webcam only required for face authentication).

๐Ÿ› ๏ธ Step-by-Step Setup

1. Clone the Repository:

git clone https://github.com/Omcodesk/AURA-AI-Voice-Assistant-.git
cd AURA-AI-Voice-Assistant-

2. Create Virtual Environment:

python -m venv .venv
.venv\Scripts\Activate.ps1

3. Install Dependencies:

pip install -r requirements.txt

4. Configure API Key:

Create a .env file in the root directory:

GROQ_API_KEY=gsk_your_groq_api_key_here

5. First Run (auto-initializes database and downloads face models):

python app.py --bypass-auth

๐Ÿ’ก On first run, YuNet and SFace ONNX models (~37 MB total) are downloaded automatically from the OpenCV model zoo.


๐Ÿ’ก Usage Instructions

๐Ÿ”’ Normal Mode (with Face Authentication)

python app.py
  • AURA starts silently in the background.
  • Say "Take Control" โ€” the window surfaces and the camera activates for face verification.
  • After successful authentication, say any command.

๐Ÿ› ๏ธ Developer Mode (Skip Authentication)

python app.py --bypass-auth
  • Skips biometric verification entirely.
  • Launches directly into the console UI, logged in as Omm.
  • Ideal for development and testing.

๐Ÿ‘ค Enrolling a New User

  • Click the โš™๏ธ Settings tab โ†’ ๐Ÿ‘ฅ Enroll New User.
  • Follow the on-screen prompts to capture your face from multiple angles.
  • Embeddings are stored locally in aura.db โ€” never uploaded anywhere.

๐Ÿ“‚ Project Structure

AURA/
โ”‚
โ”œโ”€โ”€ actions/                  # ๐Ÿ› ๏ธ All action handlers โ€” registered in ActionRegistry
โ”‚   โ”œโ”€โ”€ app_control.py        # Open/close apps via subprocess + psutil
โ”‚   โ”œโ”€โ”€ browser_control.py    # Browser launch + search routing (Google, YouTube, sites)
โ”‚   โ”œโ”€โ”€ conversation.py       # LLM chat (Groq LLaMA, 10-turn history)
โ”‚   โ”œโ”€โ”€ media_control.py      # Media keys (play/pause/next/prev) via pyautogui
โ”‚   โ”œโ”€โ”€ reminders.py          # Schedule and store reminders/alarms to SQLite
โ”‚   โ”œโ”€โ”€ screenshot_service.py # Screen capture via Pillow
โ”‚   โ”œโ”€โ”€ system_control.py     # OS-level: shutdown/restart/lock/sleep/volume/brightness
โ”‚   โ”œโ”€โ”€ time_service.py       # Formatted local time/date responses
โ”‚   โ”œโ”€โ”€ weather_service.py    # Open-Meteo API (geocoding + weather codes)
โ”‚   โ””โ”€โ”€ whatsapp.py           # WhatsApp Web URL automation
โ”‚
โ”œโ”€โ”€ audio/                    # ๐ŸŽค Audio pipeline โ€” microphone โ†’ VAD โ†’ utterance
โ”‚   โ”œโ”€โ”€ mic_stream.py         # Threaded PyAudio capture into bounded queue
โ”‚   โ”œโ”€โ”€ vad_manager.py        # WebRTC VAD: 30ms frames, ring-buffer, pre-pad
โ”‚   โ””โ”€โ”€ wake_listener.py      # Wake phrase checker using WhisperSTT
โ”‚
โ”œโ”€โ”€ auth/                     # ๐Ÿ›ก๏ธ Biometric security subsystem
โ”‚   โ”œโ”€โ”€ enroll_manager.py     # Multi-frame face enrollment + embedding storage
โ”‚   โ”œโ”€โ”€ face_auth.py          # YuNet detection + SFace 128-d embedding + cosine similarity
โ”‚   โ”œโ”€โ”€ liveness.py           # Anti-spoofing checks (blink / motion detection)
โ”‚   โ””โ”€โ”€ user_registry.py      # SQLite user store with aura.db / jarvis.db migration
โ”‚
โ”œโ”€โ”€ brain/                    # ๐Ÿง  NLU โ€” intent classification and slot extraction
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ command_normalizer.py   # Stopword removal, synonym expansion
โ”‚   โ”‚   โ”œโ”€โ”€ confidence_guard.py     # Low-confidence escalation logic
โ”‚   โ”‚   โ”œโ”€โ”€ context_memory.py       # Pronoun resolution (it/this/that)
โ”‚   โ”‚   โ”œโ”€โ”€ fast_rule_matcher.py    # Stage 1: keyword/synonym pattern matching
โ”‚   โ”‚   โ”œโ”€โ”€ intent_engine.py        # Full NLU pipeline orchestrator
โ”‚   โ”‚   โ”œโ”€โ”€ llm_intent_brain.py     # Stage 2: Groq LLaMA JSON classification
โ”‚   โ”‚   โ”œโ”€โ”€ slot_extractor.py       # Entity extraction (app, site, location, time, query)
โ”‚   โ”‚   โ””โ”€โ”€ time_parser.py          # NLP time/date parsing for reminders
โ”‚   โ”œโ”€โ”€ intent_router.py      # Maps engine output โ†’ ParsedCommand objects
โ”‚   โ””โ”€โ”€ memory_manager.py     # SQLModel ORM: conversations, reminders, aura.db
โ”‚
โ”œโ”€โ”€ config/                   # โš™๏ธ Configuration files
โ”‚   โ”œโ”€โ”€ settings.yaml         # All tunable parameters (VAD, STT, TTS, LLM, session)
โ”‚   โ”œโ”€โ”€ app_mappings.json     # App name โ†’ executable mappings
โ”‚   โ”œโ”€โ”€ site_mappings.json    # Site name โ†’ URL mappings
โ”‚   โ””โ”€โ”€ synonym_map.json      # Natural language synonym dictionary
โ”‚
โ”œโ”€โ”€ core/                     # ๐Ÿ—๏ธ Core infrastructure โ€” no business logic
โ”‚   โ”œโ”€โ”€ action_registry.py    # Handler registration table (intent+action โ†’ function)
โ”‚   โ”œโ”€โ”€ command_parser.py     # Raw intent + slots โ†’ ParsedCommand dataclass
โ”‚   โ”œโ”€โ”€ config_loader.py      # YAML config + .env loader (singleton)
โ”‚   โ”œโ”€โ”€ event_bus.py          # Thread-safe pub/sub bus via Qt signals
โ”‚   โ”œโ”€โ”€ logger.py             # Loguru setup (rotating files + Qt bridge)
โ”‚   โ”œโ”€โ”€ policy_engine.py      # Safety blocklist (blocks "bye", "thanks", etc.)
โ”‚   โ”œโ”€โ”€ result_types.py       # ParsedCommand + ExecutionResult dataclasses
โ”‚   โ”œโ”€โ”€ session_manager.py    # Session lifecycle (auth, touch, auto-lock)
โ”‚   โ””โ”€โ”€ state_machine.py      # Validated 7-state FSM with thread-safe transitions
โ”‚
โ”œโ”€โ”€ gui/                      # ๐Ÿ–ผ๏ธ PySide6 user interface
โ”‚   โ”œโ”€โ”€ admin_window.py       # Settings panel
โ”‚   โ”œโ”€โ”€ auth_window.py        # Face authentication + enrollment screen
โ”‚   โ”œโ”€โ”€ confirmation_dialog.py # Voice-triggered visual confirm dialog
โ”‚   โ”œโ”€โ”€ console_window.py     # Main voice console (orb + transcript + state label)
โ”‚   โ”œโ”€โ”€ enroll_dialog.py      # New user enrollment dialog
โ”‚   โ”œโ”€โ”€ main_window.py        # Root window โ€” wires all subsystems together
โ”‚   โ”œโ”€โ”€ memory_window.py      # Conversation history browser
โ”‚   โ”œโ”€โ”€ theme.qss             # Dark cyberpunk Qt stylesheet
โ”‚   โ””โ”€โ”€ widgets/
โ”‚       โ”œโ”€โ”€ activity_card.py  # "Processing..." activity display
โ”‚       โ”œโ”€โ”€ orb_widget.py     # Animated orb that reflects system state
โ”‚       โ”œโ”€โ”€ status_bar.py     # Session countdown + mic status
โ”‚       โ””โ”€โ”€ transcript_panel.py # Scrollable user/AURA conversation cards
โ”‚
โ”œโ”€โ”€ services/                 # ๐Ÿš€ Application-layer services
โ”‚   โ”œโ”€โ”€ action_dispatcher.py  # Routes ParsedCommand to registered handler
โ”‚   โ”œโ”€โ”€ confirmation_service.py # Manages pending confirmation state
โ”‚   โ””โ”€โ”€ session_guard.py      # Access control โ€” requires re-auth for sensitive actions
โ”‚
โ”œโ”€โ”€ speech/                   # ๐Ÿ—ฃ๏ธ Speech I/O
โ”‚   โ”œโ”€โ”€ response_formatter.py # Cleans LLM output for speech (strips markdown etc.)
โ”‚   โ”œโ”€โ”€ transcript_validator.py # Rejects noise/too-short/hallucinated transcripts
โ”‚   โ”œโ”€โ”€ tts_engine.py         # TTSThread: SAPI5 SpVoice, async, interruptible
โ”‚   โ””โ”€โ”€ whisper_stt.py        # Groq Whisper: PCM โ†’ WAV โ†’ API โ†’ transcript
โ”‚
โ”œโ”€โ”€ models/face/              # ๐Ÿงฌ ONNX face models (auto-downloaded on first run)
โ”‚   โ”œโ”€โ”€ face_detection_yunet_2023mar.onnx   # ~200 KB
โ”‚   โ””โ”€โ”€ face_recognition_sface_2021dec.onnx # ~37 MB
โ”‚
โ”œโ”€โ”€ tests/                    # ๐Ÿงช Tests
โ”‚   โ””โ”€โ”€ test_universal_brain.py  # Unit tests for intent classification pipeline
โ”‚
โ”œโ”€โ”€ .env.example              # ๐Ÿ“ Template โ€” copy to .env and fill your API key
โ”œโ”€โ”€ app.py                    # ๐ŸŽฏ Main entry point
โ”œโ”€โ”€ requirements.txt          # ๐Ÿ“ฆ All Python dependencies
โ””โ”€โ”€ aura_start.bat            # ๐Ÿƒโ€โ™‚๏ธ One-click Windows launcher

๐Ÿ’ฌ Example Voice Commands

Category Example Command
๐ŸŽ™๏ธ Wake "Take Control"
๐Ÿš€ App Launch "Open Chrome" / "Launch Notepad" / "Open VS Code"
๐Ÿ›‘ App Close "Close Spotify" / "Close Chrome"
๐Ÿ” Web Search "Search for Python tutorials on Google"
๐Ÿ“บ YouTube "Search for lo-fi music on YouTube"
๐ŸŒ Website "Open GitHub" / "Open web.whatsapp.com"
โฐ Time "What time is it?" / "What's today's date?"
๐ŸŒค๏ธ Weather "Weather in Delhi" / "How's the weather?"
โš™๏ธ System "Shutdown the PC" / "Restart" / "Lock the computer"
๐Ÿ”Š Volume "Increase volume" / "Mute"
๐ŸŽต Media "Play" / "Pause" / "Next track"
๐Ÿ“ธ Screenshot "Take a screenshot" / "Capture screen"
๐Ÿ’ฌ WhatsApp "Send a WhatsApp message to John"
๐Ÿ“ง Email "Draft an email to boss"
๐Ÿ”” Reminder "Remind me to drink water at 6 PM"
โฑ๏ธ Alarm "Set an alarm for 7 AM"
๐Ÿค– Conversation "What is machine learning?" / "Tell me a joke"

๐Ÿ”ฎ Future Enhancements

  • ๐Ÿ“ด Offline STT โ€” Local Whisper.cpp integration for 100% air-gapped operation.
  • ๐ŸŽฏ Custom Wake Phrase Training โ€” Real-time acoustic model fine-tuning.
  • ๐Ÿงฉ Plugin System โ€” Drop-in action handlers via a plugin directory.
  • ๐Ÿ‘๏ธ Vision Integration โ€” Screen-reading using a vision-language model.
  • ๐Ÿ“ฑ Android Companion App โ€” Remote monitoring and command via mobile.

๐Ÿค Contributing

Contributions are welcome! ๐ŸŽ‰

  1. Fork the Project.
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature).
  3. Commit your Changes (git commit -m 'Add some AmazingFeature').
  4. Push to the Branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

๐Ÿ“„ License

Distributed under the MIT License. See LICENSE for more information.


Built with โค๏ธ by Omcodesk

About

AURA ๐ŸŽ™๏ธ โ€” A next-gen AI voice assistant that controls your PC via natural language. Face auth ๐Ÿ”’, smart intent routing ๐Ÿง , app control ๐Ÿ–ฅ๏ธ, web automation ๐ŸŒ, and real-time TTS ๐Ÿ”Š. No mouse needed โ†’ just speak ๐Ÿš€

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors