audio generation and result working

j4wg · memit0 · Oct 5, 2025 · Oct 5, 2025 · Oct 5, 2025 · Oct 6, 2025
commit d500e7991f24eb031cf412ecbbac3b7d9c05cddc
diff --git a/AUDIO_FEATURE_README.md b/AUDIO_FEATURE_README.md
@@ -1,20 +1,21 @@
 # Audio Recording Feature for Behavioral Questions
 
 ## Overview
-The Interview Coder app now includes an audio recording feature that helps you practice behavioral interview questions. This feature records your voice, transcribes the question using OpenRouter's Whisper API, and generates professional answers using the STAR method.
+The Interview Coder app now includes an audio recording feature that helps you practice behavioral interview questions. This feature records your voice, transcribes the question using OpenRouter's GPT-4o audio capabilities, and generates professional answers using the STAR method.
 
 ## Features
 - **Audio Recording**: Record questions using your computer's microphone
-- **Speech-to-Text**: Automatic transcription using OpenRouter Whisper
-- **Answer Generation**: AI-powered behavioral interview answers using the STAR method
+- **Speech-to-Text**: Automatic transcription using OpenRouter GPT-4o audio or OpenAI Whisper
+- **Answer Generation**: AI-powered behavioral interview answers using GPT-4o
 - **Playback**: Review your recorded audio before processing
 - **Professional Answers**: Detailed, structured responses suitable for interviews
 
 ## How to Use
 
 ### Prerequisites
-1. Ensure you have a valid OpenRouter API key configured in the app settings
-2. Grant microphone permissions when prompted by your browser/system
+1. **OpenRouter API Key** (Recommended): For both audio transcription and answer generation using GPT-4o audio
+2. **OpenAI API Key** (Alternative): If you prefer to use OpenAI's Whisper for transcription
+3. Grant microphone permissions when prompted by your browser/system
 
 ### Step-by-Step Usage
 1. **Open the App**: Launch the Interview Coder application
@@ -26,43 +27,37 @@ The Interview Coder app now includes an audio recording feature that helps you p
 7. **Generate Answer**: Click "Generate Answer" to process the audio
 8. **View Results**: The transcribed question and generated answer will appear below
 
-### Example Questions
-The feature works best with behavioral interview questions such as:
-- "Tell me about a time when you had to work with a difficult team member"
-- "Describe a situation where you had to meet a tight deadline"
-- "Give me an example of when you had to solve a complex problem"
-- "Tell me about a time when you showed leadership"
-
 ## Technical Details
 
-### Audio Format
-- Records in WebM format with Opus codec
-- Optimized for speech recognition with echo cancellation and noise suppression
-- Sample rate: 16kHz for optimal Whisper API performance
+### Audio Processing
+- **OpenRouter Users**: Uses GPT-4o audio model for transcription via multimodal chat completions
+- **OpenAI Users**: Uses Whisper-1 model for transcription via dedicated audio API
+- **Supported Formats**: WebM (recorded), WAV, MP3
+- **Base64 Encoding**: Audio is automatically converted to base64 for OpenRouter processing
 
 ### Answer Generation
-- Uses OpenRouter models (configurable in settings, defaults to Claude 3.5 Sonnet)
+- Uses GPT-4o model (configurable in settings)
 - Follows the STAR method (Situation, Task, Action, Result)
 - Generates 300-450 word responses (2-3 minutes when spoken)
 - Professional, conversational tone suitable for interviews
 
-### Privacy & Security
-- Audio files are temporarily stored during processing and immediately deleted
-- No audio data is permanently stored on your device
-- All processing uses your personal OpenRouter API key
+### API Key Detection
+The app automatically detects your API key type:
+- **OpenRouter keys** (`sk-or-...`): Uses multimodal audio API for transcription
+- **OpenAI keys** (`sk-...`): Uses traditional Whisper API for transcription
 
 ## Troubleshooting
 
 ### Common Issues
 1. **Microphone Not Working**: Check browser/system permissions for microphone access
 2. **No Transcription**: Ensure you're speaking clearly and the recording has audio
-3. **API Errors**: Verify your OpenRouter API key is valid and has sufficient credits
+3. **API Errors**: Verify your API key is valid and has sufficient credits
 4. **Poor Audio Quality**: Try recording in a quieter environment
 
 ### Error Messages
 - "Failed to start recording": Check microphone permissions
 - "No speech detected": The recording may be too quiet or empty
-- "OpenRouter API key required": Configure your API key in settings
+- "API key required": Configure your API key in settings
 - "Failed to process audio": Check your internet connection and API key
 
 ## Tips for Best Results
@@ -71,10 +66,3 @@ The feature works best with behavioral interview questions such as:
 3. **Complete Questions**: Ask full, complete behavioral interview questions
 4. **Review Answers**: Use the generated answers as a starting point and personalize them
 5. **Practice**: Use the feature regularly to improve your interview skills
-
-## Integration
-The audio recording feature is seamlessly integrated into the existing Interview Coder interface:
-- Located below the screenshot queue in the main interface
-- Uses the same toast notification system for feedback
-- Shares the OpenRouter API configuration with other features
-- Maintains the app's dark theme and consistent UI design
diff --git a/electron/ipcHandlers.ts b/electron/ipcHandlers.ts
@@ -4,6 +4,67 @@ import { ipcMain, shell, dialog } from "electron"
 import { randomBytes } from "crypto"
 import { IIpcHandlerDeps } from "./main"
 import { configHelper } from "./ConfigHelper"
+import ffmpeg from 'fluent-ffmpeg'
+import ffmpegStatic from 'ffmpeg-static'
+import * as fs from 'fs'
+import * as path from 'path'
+import * as os from 'os'
+
+// Set FFmpeg path to the bundled binary
+if (ffmpegStatic) {
+  ffmpeg.setFfmpegPath(ffmpegStatic)
+}
+
+// WebM to WAV conversion function using FFmpeg
+async function convertWebMToWAV(webmBuffer: Buffer): Promise<Buffer> {
+  return new Promise((resolve, reject) => {
+    const tempDir = os.tmpdir()
+    const inputPath = path.join(tempDir, `input_${Date.now()}.webm`)
+    const outputPath = path.join(tempDir, `output_${Date.now()}.wav`)
+
+    try {
+      // Write WebM buffer to temporary file
+      fs.writeFileSync(inputPath, webmBuffer)
+
+      // Convert WebM to WAV using FFmpeg
+      ffmpeg(inputPath)
+        .toFormat('wav')
+        .audioFrequency(16000)  // 16kHz sample rate for OpenRouter
+        .audioChannels(1)       // Mono audio
+        .audioBitrate('16k')    // 16-bit audio
+        .on('end', () => {
+          try {
+            // Read the converted WAV file
+            const wavBuffer = fs.readFileSync(outputPath)
+
+            // Cleanup temporary files
+            try { fs.unlinkSync(inputPath) } catch {}
+            try { fs.unlinkSync(outputPath) } catch {}
+
+            resolve(wavBuffer)
+          } catch (readError) {
+            reject(new Error(`Failed to read converted WAV file: ${readError.message}`))
+          }
+        })
+        .on('error', (err: any) => {
+          // Cleanup temporary files on error
+          try { fs.unlinkSync(inputPath) } catch {}
+          try { fs.unlinkSync(outputPath) } catch {}
+
+          reject(new Error(`FFmpeg conversion failed: ${err.message}`))
+        })
+        .save(outputPath)
+
+    } catch (error) {
+      // Cleanup on any error
+      try { fs.unlinkSync(inputPath) } catch {}
+      try { fs.unlinkSync(outputPath) } catch {}
+
+      reject(new Error(`WebM to WAV conversion setup failed: ${error.message}`))
+    }
+  })
+}
+
 
 export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
   console.log("Initializing IPC handlers")
@@ -26,11 +87,11 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
     if (!configHelper.isValidApiKeyFormat(apiKey)) {
       return { 
         valid: false, 
-        error: "Invalid API key format. OpenAI API keys start with 'sk-'" 
+        error: "Invalid API key format. OpenRouter API keys start with 'sk-or-', OpenAI keys start with 'sk-'" 
       };
     }
 
-    // Then test the API key with OpenAI
+    // Then test the API key with the appropriate provider
     const result = await configHelper.testApiKey(apiKey);
     return result;
   })
@@ -354,49 +415,112 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
     try {
       // Check for API key before processing
       if (!configHelper.hasApiKey()) {
-        throw new Error("OpenRouter API key is required for audio transcription")
+        throw new Error("API key is required for audio transcription")
       }
 
       const config = configHelper.loadConfig()
       const apiKey = config.apiKey
 
       if (!apiKey) {
-        throw new Error("OpenRouter API key not found")
+        throw new Error("API key not found")
       }
 
       const fs = require('fs')
       const path = require('path')
       const os = require('os')
       const OpenAI = require('openai')
 
-      // Use OpenRouter API for Whisper
-      const openai = new OpenAI({
-        apiKey,
-        baseURL: "https://openrouter.ai/api/v1"
-      })
-
-      // Create a temporary file
-      const tempDir = os.tmpdir()
-      const tempFilePath = path.join(tempDir, `temp_audio_${Date.now()}_${filename}`)
+      // For OpenRouter, we need to convert WebM to WAV since OpenRouter only supports wav/mp3
+      // Determine the actual format we'll send (always WAV for WebM input)
+      const isWebM = filename.toLowerCase().includes('webm')
+      const isMp3 = filename.toLowerCase().endsWith('.mp3')
+      const audioFormat = isMp3 ? 'mp3' : 'wav'
 
-      // Write the buffer to a temporary file
-      fs.writeFileSync(tempFilePath, audioBuffer)
+      if (apiKey.startsWith('sk-or-')) {
+        // Use OpenRouter's multimodal audio API
+        const openai = new OpenAI({
+          apiKey,
+          baseURL: "https://openrouter.ai/api/v1",
+          defaultHeaders: {
+            "HTTP-Referer": "https://github.com/your-repo",
+            "X-Title": "OIC - Online Interview Companion"
+          }
+        })
 
-      try {
-        // Use OpenRouter's Whisper API for transcription
-        const transcription = await openai.audio.transcriptions.create({
-          file: fs.createReadStream(tempFilePath),
-          model: "openai/whisper-1",
-          language: "en"
+        let processedAudioBuffer = audioBuffer
+        let finalFormat = audioFormat
+
+        // If it's WebM, convert it to WAV using FFmpeg
+        if (isWebM) {
+          try {
+            console.log('Converting WebM to WAV using FFmpeg...')
+            processedAudioBuffer = await convertWebMToWAV(audioBuffer)
+            finalFormat = 'wav'
+            console.log('Successfully converted WebM to WAV')
+          } catch (conversionError) {
+            console.error('WebM to WAV conversion failed:', conversionError)
+            throw new Error(`Failed to convert WebM audio to WAV format: ${conversionError.message}. Please ensure FFmpeg is properly installed.`)
+          }
+        }
+
+        // Convert processed audio buffer to base64
+        const base64Audio = processedAudioBuffer.toString('base64')
+
+        // Use chat completions with audio input for transcription
+        const completion = await openai.chat.completions.create({
+          model: "openai/gpt-4o-audio-preview",
+          messages: [
+            {
+              role: "user",
+              content: [
+                {
+                  type: "text",
+                  text: "Please transcribe this audio file. Return only the transcribed text without any additional commentary."
+                },
+                {
+                  type: "input_audio",
+                  input_audio: {
+                    data: base64Audio,
+                    format: finalFormat
+                  }
+                }
+              ]
+            }
+          ],
+          max_tokens: 500,
+          temperature: 0.1
         })
 
-        return { text: transcription.text }
-      } finally {
-        // Clean up the temporary file
+        const transcribedText = completion.choices[0]?.message?.content || ""
+        return { text: transcribedText }
+
+      } else {
+        // Use OpenAI directly for Whisper transcription
+        const openai = new OpenAI({ apiKey })
+
+        // Create a temporary file
+        const tempDir = os.tmpdir()
+        const tempFilePath = path.join(tempDir, `temp_audio_${Date.now()}_${filename}`)
+
+        // Write the buffer to a temporary file
+        fs.writeFileSync(tempFilePath, audioBuffer)
+
         try {
-          fs.unlinkSync(tempFilePath)
-        } catch (cleanupError) {
-          console.warn("Failed to clean up temporary audio file:", cleanupError)
+          // Use OpenAI's Whisper for transcription
+          const transcription = await openai.audio.transcriptions.create({
+            file: fs.createReadStream(tempFilePath),
+            model: "whisper-1",
+            language: "en"
+          })
+
+          return { text: transcription.text }
+        } finally {
+          // Clean up the temporary file
+          try {
+            fs.unlinkSync(tempFilePath)
+          } catch (cleanupError) {
+            console.warn("Failed to clean up temporary audio file:", cleanupError)
+          }
         }
       }
     } catch (error) {
@@ -409,23 +533,38 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
     try {
       // Check for API key before processing
       if (!configHelper.hasApiKey()) {
-        throw new Error("OpenRouter API key is required for answer generation")
+        throw new Error("API key is required for answer generation")
       }
 
       const config = configHelper.loadConfig()
       const apiKey = config.apiKey
 
       if (!apiKey) {
-        throw new Error("OpenRouter API key not found")
+        throw new Error("API key not found")
       }
 
       const OpenAI = require('openai')
 
-      // Use OpenRouter API for chat completions
-      const openai = new OpenAI({
-        apiKey,
-        baseURL: "https://openrouter.ai/api/v1"
-      })
+      // Use OpenRouter for answer generation if available, otherwise use OpenAI
+      let openai
+      let modelToUse
+
+      if (apiKey.startsWith('sk-or-')) {
+        // Use OpenRouter API for chat completions
+        openai = new OpenAI({
+          apiKey,
+          baseURL: "https://openrouter.ai/api/v1",
+          defaultHeaders: {
+            "HTTP-Referer": "https://github.com/your-repo",
+            "X-Title": "OIC - Online Interview Companion"
+          }
+        })
+        modelToUse = config.solutionModel || "openai/gpt-4o"
+      } else {
+        // Use OpenAI directly
+        openai = new OpenAI({ apiKey })
+        modelToUse = config.solutionModel || "gpt-4o"
+      }
 
       const prompt = `You are an expert interview coach helping someone prepare for behavioral interviews. 
 
@@ -440,9 +579,6 @@ Please provide a comprehensive, professional answer using the STAR method (Situa
 
 Provide only the answer, without any prefacing text like "Here's a good answer:" or similar.`
 
-      // Use a good OpenRouter model for behavioral questions
-      const modelToUse = config.solutionModel || "anthropic/claude-3.5-sonnet"
-
       const completion = await openai.chat.completions.create({
         model: modelToUse,
         messages: [