VoiceReactify is a React-based application that integrates audio processing with Text-to-Speech (TTS) functionalities. This project allows users to record audio, transcribe it into text using AI models, and optionally convert the text back into speech. It also supports voice assignment based on Microsoft's Speech API.
- Audio Recording: Capture audio directly from the browser.
- Real-time Transcription: Convert audio to text using advanced AI models.
- Text-to-Speech: Convert transcribed text back to speech using TTS.
- React Integration: Built using the React framework for a seamless user experience.
- Voice Assignment: Use Microsoft's API to customize voice characteristics.
Before you begin, ensure you have met the following requirements:
- Node.js: Install from here.
- pnpm: Comes with Node.js.
- Python: Required for backend processing. Install from here.
- Docker (Optional): For containerization. Install from here.
Follow these steps to set up the project locally:
-
Clone the Repository
git clone https://github.com/yourusername/VoiceReactify.git cd VoiceReactify
-
Install Frontend Dependencies
cd frontend pnpm install
-
Install Backend Dependencies
cd ../backend pip install -r requirements.txt
-
Backend Environment Variables
Create a
env
file in thebackend
directory:AZURE_SPEECH_KEY = 'xxx' AZURE_SPEECH_REGION = 'xxx'
If you prefer using Docker:
-
Modify Terms of Service Agreement
Due to licensing requirements, you need to modify a backend file to automatically agree to the terms of service when running in Docker. See the Important Notice section below.
-
Build Docker Images
docker-compose build
-
Run Docker Containers
docker-compose up
-
Start the Backend Server
Make backend # in root --- or cd backend python app.py
-
Start the Frontend Server
Open a new terminal window:
Make frontend # in root --- or cd frontend pnpm run dev
-
Access the Application
Navigate to
http://localhost:5173
in your web browser. -
Record and Transcribe Audio
- Click on the Record button to start recording.
- Click Stop to finish recording.
- The audio will be transcribed into text automatically.
-
Convert Text to Speech
- Choose a voice using the Voice Assignment feature.
- Click on Play to hear the synthesized speech.(It is a icon)
Also, your computer needs to have enough memory to run this app.
When running the application in Docker, you need to modify the backend code to automatically agree to the terms of service for certain AI models. This is necessary for downloading and using specific models within a containerized environment.
Modify the ask_tos
Method
In the backend code, locate the ask_tos
method and replace it with the following:
@staticmethod
def ask_tos(model_full_path):
"""Automatically agree to the terms of service"""
tos_path = os.path.join(model_full_path, "tos_agreed.txt")
print(" > Automatically agreeing to the terms of service:")
print(' | > "I have purchased a commercial license from Coqui: licensing@coqui.ai"')
print(' | > "Otherwise, I agree to the terms of the non-commercial CPML: https://coqui.ai/cpml"')
# Automatically agree to the license agreement
with open(tos_path, "w", encoding="utf-8") as f:
f.write("I have read, understood and agreed to the Terms and Conditions.")
return True
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m 'Add YourFeature'
-
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.