Note:
This repository has been updated with pygame for audio playback, replacing the previous playsound implementation for better compatibility and performance.
This Python module provides a voice-activated, voice-to-voice interface for interacting with Large Language Models (LLMs) via Ollama and Google Text-to-Speech (gTTS). Designed as an addon, it can be easily integrated into other Python projects. The module allows users to:
- Choose a specific LLM model at runtime (default is llama3.2:1b).
- Customize the system prompt to alter the AI assistant's behavior.
- Provide initial messages to Ollama for more complex interactions.
- Features
- Requirements
- Installation
- Usage
- Customization
- Code Overview
- Known Issues
- License
- Acknowledgments
- Enhanced Modular Design: Easily integrated as an addon into existing Python projects with optional dependencies loaded only when needed.
- Custom LLM Selection: Users can specify which LLM model to use at runtime.
- Custom System Prompt: Ability to change the AI assistant's behavior via a custom system prompt.
- Initial Messages Support: Accepts an initial list of messages to send to Ollama for context.
- Custom Activation Phrases: Define your own activation phrases to initiate interaction (default: "Hello Llama" or "Hey Llama").
- Voice Query and Response: Converts spoken queries to text, sends them to the LLM, and reads the AI's response aloud.
- Multiple TTS Options: Choose between Google Text-to-Speech (gTTS) or Piper for text-to-speech conversion.
- Pygame Audio Playback: Reliable audio playback using the pygame library.
- Ollama Integration: Connects with Ollama to generate AI responses from the specified LLM model.
- Python 3.12
- gTTS: For text-to-speech conversion
- pygame: For audio playback
- SpeechRecognition: For handling voice input (only needed if using voice recognition)
- pyaudio: For audio playback
- Ollama API: Installed and configured on your machine (only needed if using Ollama for LLM queries)
- Piper: Optional for higher quality text-to-speech
- argparse: For command-line argument parsing
- json: For handling messages in JSON format
Install the required libraries using pip:
pip install gtts pygame SpeechRecognition pyaudio
For optional components:
# For Ollama LLM integration
pip install ollama piperTTS
Note: Installing
pyaudio
may require additional setup, especially on non-Windows systems.
Run the script and specify optional arguments:
python voice_llm_addon.py --model <model_name> --system_prompt "<your_system_prompt>" --messages_file <path_to_messages_json> --activation_phrases "hello llama" "hey there llama"
--model
: (Optional) Specify the LLM model to use. Defaults to llama3.2:1b.--system_prompt
: (Optional) Provide a custom system prompt to change the assistant's behavior.--messages_file
: (Optional) Path to a JSON file containing initial messages to send to Ollama.--activation_phrases
: (Optional) Space-separated list of phrases that will activate the assistant. Defaults to "hello llama" "hey llama".
-
Include the Module: Add
voice_llm_addon.py
to your project directory. -
Import the Class:
from voice_llm_addon import VoiceLLMAddon
-
Initialize and Run:
addon = VoiceLLMAddon( model='your_model_name', # Optional system_prompt='Your system prompt', # Optional initial_messages=[ # Optional {'role': 'system', 'content': 'System message'}, {'role': 'user', 'content': 'User message'}, ], activation_phrases=["hello llama", "hey assistant"] # Optional ) addon.run()
- Change Activation Phrases: Provide custom activation phrases using the
activation_phrases
parameter when initializingVoiceLLMAddon
or via the--activation_phrases
command-line argument. - Choose TTS Engine: Use the optional
piper
parameter in thespeak
method to toggle between gTTS (default) and Piper for text-to-speech. - Select Voice Model: When using Piper, specify a different voice model file with the
piper_model
parameter in thespeak
method. - Adjust AI Personality: Provide a custom
system_prompt
when initializingVoiceLLMAddon
or via the--system_prompt
command-line argument. - Provide Initial Messages: Supply an initial list of messages via the
initial_messages
parameter or by specifying a JSON file with--messages_file
.
The VoiceLLMAddon
class encapsulates all functionality required for voice interaction with the LLM.
def __init__(self, model='llama3.2:1b', system_prompt=None, initial_messages=None, activation_phrases=["hello llama", "hey llama"]):
self.model = model
self.system_prompt = system_prompt
self.initial_messages = initial_messages or []
self.recognizer = None # Speech recognizer will be initialized in run()
self.activation_phrases = activation_phrases or ["hello llama", "hey llama"]
self.activated = False # Indicates if the activation phrase was detected
- model: Specifies the LLM model to use.
- system_prompt: Custom system prompt to alter the assistant's behavior.
- initial_messages: List of initial messages to send to Ollama.
- recognizer: The speech recognizer that will be initialized in the
run()
method. - activation_phrases: List of phrases that will trigger the assistant to start listening for a query.
- activated: A flag to track whether the activation phrase has been detected.
Converts text to speech using either gTTS (default) or Piper (if piper=True), and plays it using pygame.
- Parameters:
text
(str): The text to convert to speech.piper
(bool, optional): If True, uses Piper for text-to-speech conversion instead of gTTS. Defaults to False.piper_model
(str, optional): The Piper voice model file to use. Defaults to "en_US-lessac-medium.onnx".
Sends a text prompt to the Ollama LLM and returns the response.
- Parameters:
prompt
(str): The user's query to send to the LLM.
- Returns:
response
(str): The LLM's response.
Starts the voice interaction loop. Listens for the activation phrase and processes user queries.
- Ollama Model Compatibility: Ensure the specified LLM model is installed and compatible with Ollama.
- Microphone Access: The script requires access to your system's microphone. Ensure that permissions are granted.
- Piper Models: If using Piper, you need to download the appropriate voice model files and ensure they're in the correct path.
- Optional Dependencies: If you encounter import errors, ensure that you have installed the necessary dependencies for the features you're using (e.g., ollama, piper, speech_recognition).
This project is licensed under the MIT License.
This project utilizes:
- gTTS for text-to-speech conversion.
- Piper for high-quality text-to-speech conversion (optional).
- Pygame for audio playback.
- Ollama for LLM interactions.
- SpeechRecognition for voice input handling.
import os
import time
import argparse
import json
from gtts import gTTS
import wave
import pygame
#These 2 imports are imported inside the "run()" function to ensure you dont need to install them if you dont use those features
# import ollama
# import speech_recognition as sr
# Piper is imported inside the "speak" function to ensure you dont need to install it if you dont use that feature
# import piper
class VoiceLLMAddon:
def __init__(self, model='llama3.2:1b', system_prompt=None, initial_messages=None, activation_phrases=["hello llama", "hey llama"]):
self.model = model
self.system_prompt = system_prompt or (
'You are a funny and self-deprecating AI assistant. '
'Each of your replies will be a maximum of 2 sentences long. '
'Short responses are crucial to your success as an AI assistant. '
'Long replies take a long time to process and ruin the user experience. '
'Just use any useful data and respond in short responses as an intelligent AI assistant.'
)
self.initial_messages = initial_messages or []
self.recognizer = None # Speech recognizer will be initialized in run()
self.activation_phrases = activation_phrases or ["hello llama", "hey llama"]
self.activated = False # Flag to indicate if activation phrase was detected
def speak(self, text, piper=False, piper_model="en_US-lessac-medium.onnx"):
"""Convert text to speech, play it, and remove the temporary file."""
try:
filename = "audio.wav"
# Delete old audio file if it exists
audio_file = os.path.join(os.path.dirname(__file__), filename)
if os.path.exists(audio_file):
os.remove(audio_file)
time.sleep(0.5)
# Creating audio file
if piper:
from piper import PiperVoice
voice = PiperVoice.load(piper_model)
with wave.open(filename, "wb") as wav_file:
voice.synthesize_wav(text, wav_file)
else:
tts = gTTS(text=text, lang='en')
response_file = os.path.join(os.path.dirname(__file__), filename)
tts.save(response_file)
# Prepping Pygame to play audio
time.sleep(0.5) # Wait a moment to ensure file is written
print("Speaking...")
pygame.init()
pygame.mixer.quit() # Clean up any existing mixer
pygame.mixer.init(frequency=44100) # Initialize with standard frequency (44.1kHz)
if not pygame.mixer.get_init():
raise Exception("Failed to initialize pygame.mixer")
# Load and play the sound
sound = pygame.mixer.Sound(filename)
print("Audio device initialized and sound loaded")
sound.play()
# Wait for the sound to finish playing
while pygame.mixer.get_busy():
pygame.time.Clock().tick(10)
print("Done Speaking")
except Exception as e:
print("Exception while speaking:", str(e))
def query_ollama(self, prompt):
import ollama
"""Query the Ollama AI assistant with the given prompt and return the response."""
try:
# Construct messages for the chat
messages = []
if self.initial_messages == []:
messages.append({'role': 'system', 'content': self.system_prompt})
else:
messages = self.initial_messages.copy()
messages.append({'role': 'user', 'content': prompt})
# query ollama
ollama_response = ollama.chat(
model=self.model,
messages=messages,
stream=False
)
# Extract and return the response content
response = str(ollama_response['message']['content'])
return response
except Exception as e:
print("Exception while querying Ollama:", str(e))
return "I'm sorry, I couldn't process your request."
def run(self):
"""Main loop to listen for activation phrases and respond to user queries."""
# initialize and calibratespeech recognition
import speech_recognition as sr
self.recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Calibrating microphone for ambient noise...")
self.recognizer.adjust_for_ambient_noise(source)
while True:
try:
# Check for activation phrase
if not self.activated:
print("Listening for activation phrase...")
audio = self.recognizer.listen(source)
print("Processing activation phrase...")
text = self.recognizer.recognize_google(audio).lower()
print(f"Detected text: {text}")
if text not in self.activation_phrases:
print("Activation phrase not detected.")
continue
print("Activation phrase detected.")
self.activated = True
self.speak("Hello!")
# Once activation phrase is heard, Listen for user query
print("Listening for your request...")
audio = self.recognizer.listen(source)
print("Processing your request...")
text = self.recognizer.recognize_google(audio)
print(f"You said: {text}")
response = self.query_ollama(text)
self.speak(response)
self.activated = False
except sr.UnknownValueError:
print("Could not understand the audio.")
except sr.RequestError as e:
print(f"Speech Recognition service error: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
time.sleep(1)
def main():
parser = argparse.ArgumentParser(description='Voice-Activated LLM Addon')
parser.add_argument('--model', type=str, default='llama3.2:1b',
help='Specify the LLM model to use (default: llama3.2:1b)')
parser.add_argument('--system_prompt', type=str, default=None,
help='Custom system prompt for the AI assistant')
parser.add_argument('--messages_file', type=str, default=None,
help='Path to a JSON file containing initial messages')
parser.add_argument('--activation_phrases', type=str, nargs='+', default=["hello llama", "hey llama"],
help='List of phrases to activate the assistant (space-separated)')
args = parser.parse_args()
# Parsing initial messages from file if provided
initial_messages = []
if args.messages_file:
try:
with open(args.messages_file, 'r') as f:
initial_messages = json.load(f)
print(f"Loaded initial messages from {args.messages_file}")
except Exception as e:
print(f"Failed to load messages from {args.messages_file}: {e}")
print('activation phrases:', args.activation_phrases)
addon = VoiceLLMAddon(
model=args.model,
system_prompt=args.system_prompt,
initial_messages=initial_messages,
activation_phrases=args.activation_phrases
)
addon.run()
if __name__ == "__main__":
main()
You can change the AI assistant's behavior by providing a custom system prompt.
python voice_llm_addon.py --system_prompt "You are a helpful assistant that provides detailed answers."
addon = VoiceLLMAddon(system_prompt="You are a helpful assistant that provides detailed answers.")
You can supply initial messages to give context to the AI assistant.
-
Create a JSON file (e.g.,
messages.json
) with your messages:[ {"role": "system", "content": "You are an AI assistant specialized in astronomy."}, {"role": "user", "content": "Tell me about black holes."}, {"role": "assistant", "content": "Black holes are regions in space where gravity is so strong that nothing can escape."} ]
-
Run the script with the
--messages_file
argument:python voice_llm_addon.py --messages_file messages.json
initial_messages = [
{'role': 'system', 'content': 'You are an AI assistant specialized in astronomy.'},
{'role': 'user', 'content': 'Tell me about black holes.'},
{'role': 'assistant', 'content': 'Black holes are regions in space where gravity is so strong that nothing can escape.'}
]
addon = VoiceLLMAddon(initial_messages=initial_messages)