A Next.js application for searching and exploring episodes, entities, and relationships from the Founders Podcast. Features a dark-themed interface with modal views and Amazon product integration.
- 🔍 Smart Search: Search across episodes, entities, and relationships with relevance scoring
- 📱 Responsive Design: Mobile-friendly dark theme interface
- 🎯 Auto-search: Debounced search with 500ms delay for smooth user experience
- 📖 Modal Views: Detailed popups for episodes, entities, and relationships
- 🛒 Amazon Integration: Search and display relevant Amazon products for media entities
- 🏷️ Smart Pills: Clickable tags with formatted text (replaces underscores, capitalizes words)
- ⚡ Fast Loading: Optimized with Next.js 14 and app router
- Node.js 18+
- npm or yarn
- SerpAPI key for Amazon search functionality
-
Clone the repository
git clone <repository-url> cd founders-podcast
-
Install dependencies
npm install
-
Set up environment variables
Create
.env.localfile with your API keys:# Required for Amazon search functionality SERPAPI_KEY=your_serpapi_key_here # Optional: Add other AI service keys for data processing OPENAI_API_KEY=your_openai_key ANTHROPIC_API_KEY=your_anthropic_key GROQ_API_KEY=your_groq_key GEMINI_API_KEY=your_gemini_key
-
Prepare data files (see Data Processing section)
# Start development server
npm run dev
# Build for production
npm run build
# Start production server
npm start
# Run linting
npm run lintThe application will be available at http://localhost:3000
The application requires processed JSON data files. Use these scripts to generate them:
# 1. Parse episodes from Founders Podcast website
npm run parse-episodes
# 2. Extract full episode text content
npm run parse-episode-text
# 3. Clean and format episode text
npm run clean-text
# 4. Extract entities using AI services
npm run extract-entities
# 5. Extract relationships and build knowledge graph
npm run extract-graph
# 6. Complete scraping and processing pipeline
npm run scrape-and-process- parse-episodes: Scrapes episode metadata, titles, and URLs
- parse-episode-text: Extracts full episode transcripts and content
- clean-text: Formats text for better readability and paragraph breaks
- extract-entities: Uses AI to identify people, companies, products, media, and places
- extract-graph: Builds relationships between entities across episodes
- scrape-and-process: Complete end-to-end processing pipeline
The application expects these files in the src/ directory:
data-episodes-claude.json- Episode metadata with extracted entitiesdata-relationships-claude.json- Relationships between entitiesdata-episodes-text.json- Full episode text content
├── src/
│ ├── app/
│ │ ├── api/
│ │ │ └── amazon-search/route.ts # SerpAPI Amazon search
│ │ ├── globals.css # Dark theme styles
│ │ ├── layout.tsx # Root layout
│ │ └── page.tsx # Main search interface
│ ├── data-episodes-claude.json # Episode + entity data
│ ├── data-relationships-claude.json # Relationship data
│ └── data-episodes-text.json # Full episode text
├── scripts/ # Data processing scripts
│ ├── parse_episodes_nodejs.ts # Episode metadata scraping
│ ├── parse_episode_text_nodejs.ts # Episode text extraction
│ ├── clean_podcast_text.ts # Text cleaning and formatting
│ ├── extract_entities_ai.ts # AI-powered entity extraction
│ ├── comprehensive-graph-extraction.ts # Relationship building
│ └── scrape-and-process-episodes.ts # Complete pipeline
├── data/ # Raw data files (gitignored)
└── package.json # Project dependencies
The application integrates with SerpAPI to search Amazon for relevant products:
- Triggers: Entities with
amazon_searchable: trueortype: "media" - Search Terms: Uses
amazon_keywordsarray or entity name - Display: Shows thumbnail, title, and Amazon link in entity modals
Internal search functionality with relevance scoring:
- Episode Search: Title and entity matching
- Entity Search: Name and context matching
- Relationship Search: Type and description matching
- Scoring: Weighted relevance based on match type and position
- Connect repository to Vercel
- Set environment variables in Vercel dashboard
- Deploy - automatic builds on push to main branch
-
Build the application
npm run build
-
Start production server
npm start
- Debounce Delay: 500ms (configurable in
page.tsx) - Initial Results: Shows 5 most recent episodes on load
- Pill Text Limit: 15 characters with truncation
- Results Limit: No hard limit, sorted by relevance
- Close Methods: Click outside, close button, or ESC key
- Body Scroll: Disabled when modal is open
- Amazon Search: Automatic for eligible entities
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is for educational and research purposes. Podcast content belongs to the original creators.