Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
audio generation and result working
  • Loading branch information
memit0 committed Oct 5, 2025
commit d500e7991f24eb031cf412ecbbac3b7d9c05cddc
48 changes: 18 additions & 30 deletions AUDIO_FEATURE_README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
# Audio Recording Feature for Behavioral Questions

## Overview
The Interview Coder app now includes an audio recording feature that helps you practice behavioral interview questions. This feature records your voice, transcribes the question using OpenRouter's Whisper API, and generates professional answers using the STAR method.
The Interview Coder app now includes an audio recording feature that helps you practice behavioral interview questions. This feature records your voice, transcribes the question using OpenRouter's GPT-4o audio capabilities, and generates professional answers using the STAR method.

## Features
- **Audio Recording**: Record questions using your computer's microphone
- **Speech-to-Text**: Automatic transcription using OpenRouter Whisper
- **Answer Generation**: AI-powered behavioral interview answers using the STAR method
- **Speech-to-Text**: Automatic transcription using OpenRouter GPT-4o audio or OpenAI Whisper
- **Answer Generation**: AI-powered behavioral interview answers using GPT-4o
- **Playback**: Review your recorded audio before processing
- **Professional Answers**: Detailed, structured responses suitable for interviews

## How to Use

### Prerequisites
1. Ensure you have a valid OpenRouter API key configured in the app settings
2. Grant microphone permissions when prompted by your browser/system
1. **OpenRouter API Key** (Recommended): For both audio transcription and answer generation using GPT-4o audio
2. **OpenAI API Key** (Alternative): If you prefer to use OpenAI's Whisper for transcription
3. Grant microphone permissions when prompted by your browser/system

### Step-by-Step Usage
1. **Open the App**: Launch the Interview Coder application
Expand All @@ -26,43 +27,37 @@ The Interview Coder app now includes an audio recording feature that helps you p
7. **Generate Answer**: Click "Generate Answer" to process the audio
8. **View Results**: The transcribed question and generated answer will appear below

### Example Questions
The feature works best with behavioral interview questions such as:
- "Tell me about a time when you had to work with a difficult team member"
- "Describe a situation where you had to meet a tight deadline"
- "Give me an example of when you had to solve a complex problem"
- "Tell me about a time when you showed leadership"

## Technical Details

### Audio Format
- Records in WebM format with Opus codec
- Optimized for speech recognition with echo cancellation and noise suppression
- Sample rate: 16kHz for optimal Whisper API performance
### Audio Processing
- **OpenRouter Users**: Uses GPT-4o audio model for transcription via multimodal chat completions
- **OpenAI Users**: Uses Whisper-1 model for transcription via dedicated audio API
- **Supported Formats**: WebM (recorded), WAV, MP3
- **Base64 Encoding**: Audio is automatically converted to base64 for OpenRouter processing

### Answer Generation
- Uses OpenRouter models (configurable in settings, defaults to Claude 3.5 Sonnet)
- Uses GPT-4o model (configurable in settings)
- Follows the STAR method (Situation, Task, Action, Result)
- Generates 300-450 word responses (2-3 minutes when spoken)
- Professional, conversational tone suitable for interviews

### Privacy & Security
- Audio files are temporarily stored during processing and immediately deleted
- No audio data is permanently stored on your device
- All processing uses your personal OpenRouter API key
### API Key Detection
The app automatically detects your API key type:
- **OpenRouter keys** (`sk-or-...`): Uses multimodal audio API for transcription
- **OpenAI keys** (`sk-...`): Uses traditional Whisper API for transcription

## Troubleshooting

### Common Issues
1. **Microphone Not Working**: Check browser/system permissions for microphone access
2. **No Transcription**: Ensure you're speaking clearly and the recording has audio
3. **API Errors**: Verify your OpenRouter API key is valid and has sufficient credits
3. **API Errors**: Verify your API key is valid and has sufficient credits
4. **Poor Audio Quality**: Try recording in a quieter environment

### Error Messages
- "Failed to start recording": Check microphone permissions
- "No speech detected": The recording may be too quiet or empty
- "OpenRouter API key required": Configure your API key in settings
- "API key required": Configure your API key in settings
- "Failed to process audio": Check your internet connection and API key

## Tips for Best Results
Expand All @@ -71,10 +66,3 @@ The feature works best with behavioral interview questions such as:
3. **Complete Questions**: Ask full, complete behavioral interview questions
4. **Review Answers**: Use the generated answers as a starting point and personalize them
5. **Practice**: Use the feature regularly to improve your interview skills

## Integration
The audio recording feature is seamlessly integrated into the existing Interview Coder interface:
- Located below the screenshot queue in the main interface
- Uses the same toast notification system for feedback
- Shares the OpenRouter API configuration with other features
- Maintains the app's dark theme and consistent UI design
210 changes: 173 additions & 37 deletions electron/ipcHandlers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,67 @@ import { ipcMain, shell, dialog } from "electron"
import { randomBytes } from "crypto"
import { IIpcHandlerDeps } from "./main"
import { configHelper } from "./ConfigHelper"
import ffmpeg from 'fluent-ffmpeg'
import ffmpegStatic from 'ffmpeg-static'
import * as fs from 'fs'
import * as path from 'path'
import * as os from 'os'

// Set FFmpeg path to the bundled binary
if (ffmpegStatic) {
ffmpeg.setFfmpegPath(ffmpegStatic)
}

// WebM to WAV conversion function using FFmpeg
async function convertWebMToWAV(webmBuffer: Buffer): Promise<Buffer> {
return new Promise((resolve, reject) => {
const tempDir = os.tmpdir()
const inputPath = path.join(tempDir, `input_${Date.now()}.webm`)
const outputPath = path.join(tempDir, `output_${Date.now()}.wav`)

try {
// Write WebM buffer to temporary file
fs.writeFileSync(inputPath, webmBuffer)

// Convert WebM to WAV using FFmpeg
ffmpeg(inputPath)
.toFormat('wav')
.audioFrequency(16000) // 16kHz sample rate for OpenRouter
.audioChannels(1) // Mono audio
.audioBitrate('16k') // 16-bit audio
.on('end', () => {
try {
// Read the converted WAV file
const wavBuffer = fs.readFileSync(outputPath)

// Cleanup temporary files
try { fs.unlinkSync(inputPath) } catch {}
try { fs.unlinkSync(outputPath) } catch {}

resolve(wavBuffer)
} catch (readError) {
reject(new Error(`Failed to read converted WAV file: ${readError.message}`))
}
})
.on('error', (err: any) => {
// Cleanup temporary files on error
try { fs.unlinkSync(inputPath) } catch {}
try { fs.unlinkSync(outputPath) } catch {}

reject(new Error(`FFmpeg conversion failed: ${err.message}`))
})
.save(outputPath)

} catch (error) {
// Cleanup on any error
try { fs.unlinkSync(inputPath) } catch {}
try { fs.unlinkSync(outputPath) } catch {}

reject(new Error(`WebM to WAV conversion setup failed: ${error.message}`))
}
})
}


export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
console.log("Initializing IPC handlers")
Expand All @@ -26,11 +87,11 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
if (!configHelper.isValidApiKeyFormat(apiKey)) {
return {
valid: false,
error: "Invalid API key format. OpenAI API keys start with 'sk-'"
error: "Invalid API key format. OpenRouter API keys start with 'sk-or-', OpenAI keys start with 'sk-'"
};
}

// Then test the API key with OpenAI
// Then test the API key with the appropriate provider
const result = await configHelper.testApiKey(apiKey);
return result;
})
Expand Down Expand Up @@ -354,49 +415,112 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
try {
// Check for API key before processing
if (!configHelper.hasApiKey()) {
throw new Error("OpenRouter API key is required for audio transcription")
throw new Error("API key is required for audio transcription")
}

const config = configHelper.loadConfig()
const apiKey = config.apiKey

if (!apiKey) {
throw new Error("OpenRouter API key not found")
throw new Error("API key not found")
}

const fs = require('fs')
const path = require('path')
const os = require('os')
const OpenAI = require('openai')

// Use OpenRouter API for Whisper
const openai = new OpenAI({
apiKey,
baseURL: "https://openrouter.ai/api/v1"
})

// Create a temporary file
const tempDir = os.tmpdir()
const tempFilePath = path.join(tempDir, `temp_audio_${Date.now()}_${filename}`)
// For OpenRouter, we need to convert WebM to WAV since OpenRouter only supports wav/mp3
// Determine the actual format we'll send (always WAV for WebM input)
const isWebM = filename.toLowerCase().includes('webm')
const isMp3 = filename.toLowerCase().endsWith('.mp3')
const audioFormat = isMp3 ? 'mp3' : 'wav'

// Write the buffer to a temporary file
fs.writeFileSync(tempFilePath, audioBuffer)
if (apiKey.startsWith('sk-or-')) {
// Use OpenRouter's multimodal audio API
const openai = new OpenAI({
apiKey,
baseURL: "https://openrouter.ai/api/v1",
defaultHeaders: {
"HTTP-Referer": "https://github.com/your-repo",
"X-Title": "OIC - Online Interview Companion"
}
})

try {
// Use OpenRouter's Whisper API for transcription
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream(tempFilePath),
model: "openai/whisper-1",
language: "en"
let processedAudioBuffer = audioBuffer
let finalFormat = audioFormat

// If it's WebM, convert it to WAV using FFmpeg
if (isWebM) {
try {
console.log('Converting WebM to WAV using FFmpeg...')
processedAudioBuffer = await convertWebMToWAV(audioBuffer)
finalFormat = 'wav'
console.log('Successfully converted WebM to WAV')
} catch (conversionError) {
console.error('WebM to WAV conversion failed:', conversionError)
throw new Error(`Failed to convert WebM audio to WAV format: ${conversionError.message}. Please ensure FFmpeg is properly installed.`)
}
}

// Convert processed audio buffer to base64
const base64Audio = processedAudioBuffer.toString('base64')

// Use chat completions with audio input for transcription
const completion = await openai.chat.completions.create({
model: "openai/gpt-4o-audio-preview",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "Please transcribe this audio file. Return only the transcribed text without any additional commentary."
},
{
type: "input_audio",
input_audio: {
data: base64Audio,
format: finalFormat
}
}
]
}
],
max_tokens: 500,
temperature: 0.1
})

return { text: transcription.text }
} finally {
// Clean up the temporary file
const transcribedText = completion.choices[0]?.message?.content || ""
return { text: transcribedText }

} else {
// Use OpenAI directly for Whisper transcription
const openai = new OpenAI({ apiKey })

// Create a temporary file
const tempDir = os.tmpdir()
const tempFilePath = path.join(tempDir, `temp_audio_${Date.now()}_${filename}`)

// Write the buffer to a temporary file
fs.writeFileSync(tempFilePath, audioBuffer)

try {
fs.unlinkSync(tempFilePath)
} catch (cleanupError) {
console.warn("Failed to clean up temporary audio file:", cleanupError)
// Use OpenAI's Whisper for transcription
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream(tempFilePath),
model: "whisper-1",
language: "en"
})

return { text: transcription.text }
} finally {
// Clean up the temporary file
try {
fs.unlinkSync(tempFilePath)
} catch (cleanupError) {
console.warn("Failed to clean up temporary audio file:", cleanupError)
}
}
}
} catch (error) {
Expand All @@ -409,23 +533,38 @@ export function initializeIpcHandlers(deps: IIpcHandlerDeps): void {
try {
// Check for API key before processing
if (!configHelper.hasApiKey()) {
throw new Error("OpenRouter API key is required for answer generation")
throw new Error("API key is required for answer generation")
}

const config = configHelper.loadConfig()
const apiKey = config.apiKey

if (!apiKey) {
throw new Error("OpenRouter API key not found")
throw new Error("API key not found")
}

const OpenAI = require('openai')

// Use OpenRouter API for chat completions
const openai = new OpenAI({
apiKey,
baseURL: "https://openrouter.ai/api/v1"
})
// Use OpenRouter for answer generation if available, otherwise use OpenAI
let openai
let modelToUse

if (apiKey.startsWith('sk-or-')) {
// Use OpenRouter API for chat completions
openai = new OpenAI({
apiKey,
baseURL: "https://openrouter.ai/api/v1",
defaultHeaders: {
"HTTP-Referer": "https://github.com/your-repo",
"X-Title": "OIC - Online Interview Companion"
}
})
modelToUse = config.solutionModel || "openai/gpt-4o"
} else {
// Use OpenAI directly
openai = new OpenAI({ apiKey })
modelToUse = config.solutionModel || "gpt-4o"
}

const prompt = `You are an expert interview coach helping someone prepare for behavioral interviews.

Expand All @@ -440,9 +579,6 @@ Please provide a comprehensive, professional answer using the STAR method (Situa

Provide only the answer, without any prefacing text like "Here's a good answer:" or similar.`

// Use a good OpenRouter model for behavioral questions
const modelToUse = config.solutionModel || "anthropic/claude-3.5-sonnet"

const completion = await openai.chat.completions.create({
model: modelToUse,
messages: [
Expand Down
Loading