Skip to content

upload an MP3 file, convert the speech to text using OpenAI's Whisper API, generate an intelligent answer using OpenAI GPT, and finally convert the generated answer back into speech for playback.

Notifications You must be signed in to change notification settings

rastmob/talking-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Talking AI

Talking AI is a simple Node.js application that allows you to upload an MP3 file, convert the speech to text using OpenAI's Whisper API, generate an intelligent answer using OpenAI GPT, and finally convert the generated answer back into speech for playback. This app is designed with a basic front-end and demonstrates a clear chain of AI-based interactions,starting from voice, moving through natural language understanding, and returning to voice.


Purpose

The aim of this project is to demonstrate how voice-driven AI pipelines can be built using modern tools like Whisper and GPT. It helps developers explore how human-like interaction can be achieved with audio input and output. This can be extended for use cases such as voice assistants, learning tools, customer support bots, and more,especially for Turkish or multilingual audiences.


Features

  • Upload MP3 audio files
  • Transcribe speech to Turkish text using OpenAI Whisper
  • Generate a response based on the transcribed text using OpenAI GPT
  • Convert the response back to voice using OpenAI Text-to-Speech
  • Simple UI to trigger each step manually

Installation

  1. Clone the repository:

    git clone https://github.com/rastmob/talking-ai.git
    cd talking-ai
  2. Install dependencies:

    npm install
  3. Create a .env file in the root directory and add your OpenAI API key:

    OPENAI_API_KEY=your-api-key-here
    
  4. Start the server:

    npm start
  5. Open your browser and go to http://localhost:3000


Tech Stack

  • Node.js + Express
  • OpenAI Whisper API (Speech to Text)
  • OpenAI GPT API (Text Generation)
  • OpenAI TTS API (Text to Speech)
  • HTML, CSS, JavaScript (Client Side)

Project Structure

talking-ai/
├── public/              # Frontend HTML & JS
├── responses/           # Generated audio files
├── uploads/             # Uploaded MP3s
├── server.js            # Express backend
├── .env                 # Environment variables (not committed)
├── package.json

Developed by

This project is developed and maintained by Rast Mobile, an innovative software company that specializes in mobile development, AI integrations, and custom web platforms.

Contact & Profiles:


License

This project is licensed under the MIT License.

About

upload an MP3 file, convert the speech to text using OpenAI's Whisper API, generate an intelligent answer using OpenAI GPT, and finally convert the generated answer back into speech for playback.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published