Merged
Conversation
expanded implementation details 1. insanely fast whisper for STT with Google Speech Recognition as a fallback 2. WebRTCVAD for voice activity detection. the mic will only stream to the backend when voice activity is detected 3. Sesame CSM for TTS - incorporated SOTA TTS for life-like speech capabilities 4. integrated with existing chat backend - ensures seamless integration with existing agent and memory pipelines next steps - move VAD logic to Electron frontend, setup electron app with proper frontend for voice chats, modify sandboxed script and convert it to module that we can import into the main app server for usage, add websocket to main app server for voice streaming from electron.
…scription test script
…ice conversations
…lama 3.2 3b + orpheus TTS with FastRTC
FINALLY COMPLETED completed voice mode integration with existing backend - user messages are saved to the chatdb, actions are added to the task queue and memory functions also work as expected. all low-latency on RTX A5000. transcription uses faster-whisper base on CPU for now. can be moved to GPU with a larger model for better accuracy TTS uses a 4 bit quant of orpheus RTC functionality is supported by FastRTC
…tion prompt, resolved minor backend issues
…moved unnecessary fallback in index, updated requirements (freeze versions)
…nd fixing timestamp logic
…r/Sentient into fix/optimization
…r/Sentient into fix/optimization
…r/Sentient into fix/optimization
…r/Sentient into fix/optimization
…r/Sentient into fix/optimization
…ry fixes for retention days logic
…r/Sentient into fix/optimization
Co-authored-by: Abhijeet Suryawanshi <108229267+abhijeetsuryawanshi12@users.noreply.github.com>
…r/Sentient into fix/optimization
…r/Sentient into fix/optimization
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Summary
Major changes in the app.
✅ Related Issues
Closes #49, #28, #25, #22, #20, #6
🔍 Changes Made
📸 Screenshots
🔄 Additional Context
The app is still very much a work-in-progress. Full cloud migration is underway and currently does not support more than one user per server. Self-hosting now also has increased hardware demands in terms of required VRAM to run Whisper and Orpheus for voice mode.