The Podcast to Blog AI App is a full-stack application designed to transcribe, summarize, generate audio, and create visual content based on podcast episodes. The app leverages Hugging Face (HF) APIs, Eleven Labs API, and various other models to process and transform podcast content into engaging blog posts with multimedia elements. This project is built using Next.js and relies on LangChain framework for efficient data handling and API integration.
demo.mov
- Podcast Search and Retrieval: Search for podcasts using the Podcast Index API.
- Speech-to-Text Transcription: Transcribe podcast episodes using the Hugging Face Wav2Vec2 model.
- Text Summarization: Summarize transcriptions using the Hugging Face BART model.
- Text-to-Speech Conversion: Convert summarized text into speech using Eleven Labs API.
- Interactive Q&A: Allow users to interact with the recognized text through a chat interface.
- Real-Time Translation: Translate summarized text into French using the Helsinki-NLP model.
- Image Generation: Create images based on podcast content using text-to-image models.
- Multi-Modal Output: Display the generated title, image, audio, and translated text on the final result page.
- Next.js: React framework for server-rendered and statically-generated applications.
- React: JavaScript library for building user interfaces.
- LangChain: Framework for developing applications powered by language models.
- Hugging Face Inference: API for model inference, allowing easy integration of Hugging Face models.
- Eleven Labs API: API for advanced text-to-speech capabilities.
- Axios: Promise-based HTTP client for making API requests.
- Tailwind CSS: Utility-first CSS framework for rapid UI development.
- React Toastify: Library for adding notifications to React applications.
- Node.js installed on your local machine.
- Hugging Face account with API key.
- Podcast Index account with API key and API secret.
- Eleven Labs API key for text-to-speech functionality.
-
Clone the repository:
git clone https://github.com/your-username/podcast-to-blog-ai-app.git cd podcast-to-blog-ai-app
-
Install dependencies:
npm install
-
Set up environment variables:
Create a .env file in the root directory and add the following:
NEXT_PUBLIC_BASE_URL=your_base_url HF_ACCESS_TOKEN=your_huggingface_api_key NEXT_PODCAST_INDEX_API_KEY=your_podcast_index_api_key NEXT_PODCAST_INDEX_API_KEY_SECRET=your_podcast_index_api_secret ELEVENLABS_API_KEY=your_elevenlabs_api_key
-
Start the development server:
npm run dev
-
Search for a Podcast: Enter the podcast name (e.g., "EdTech Shorts") in the search bar to find a show using the Podcast Index API.
-
Select a Podcast: Choose the podcast to proceed with.
-
Select an Episode: Choose the episode from the results.
-
Process the Episode: Click the "Process episode" button.
-
Transcription and Summarization: The app will transcribe the selected episode using the Wav2Vec2 model and summarize the text using the BART model.
-
Audio Generation: The summarized text is converted to audio using the Eleven Labs API.
-
Interactive Q&A: Users can ask questions related to the podcast episode's content via a chat interface using the Zephyr-7B model.
-
Translation: Click on the translation button to convert the summarized text into French.
-
Image Generation: An image is generated based on the podcast content using the ZB-Tech or Stability AI model.
- Speech-to-Text: Wav2Vec2
- Text Summarization: BART
- Text-to-Speech: Eleven Labs API
- Q&A: Zephyr-7B
- Translation: Helsinki-NLP
- Image Generation: ZB-Tech/Text-to-Image or Stable Diffusion XL