A unified Swift library for OpenAI's Realtime API (WebSocket-based voice) and Chat API with a beautiful conversational interface.
Echo v1.0.0 brings unified voice and text conversations to Swift! This is the first production-ready release of Echo, providing seamless integration with OpenAI's Realtime and Chat APIs.
- 🎙️ Voice Conversations - Real-time voice chat using OpenAI's Realtime API
- 💬 Text Chat - Traditional text-based conversations with streaming support
- 🧮 Embeddings API - Generate text embeddings for semantic search and similarity
- 📋 Structured Output - Type-safe JSON generation with Codable schemas
- 🔄 Seamless Mode Switching - Switch between voice and text mid-conversation
- 🎯 Conversational API - Beautiful, discoverable API design
- 🛠️ Tool Calling - Function calling with MCP server support
- 📊 Event-Driven - Comprehensive event system for all interactions
Add Echo to your Package.swift:
dependencies: [
.package(url: "https://github.com/davidgeere/swift-echo.git", from: "1.0.0")
]import Echo
let echo = Echo(
key: "your-openai-api-key",
configuration: .default
)// Start a conversation
let conversation = try await echo.startConversation(
mode: .text,
systemMessage: "You are a helpful assistant."
)
// Send messages
try await conversation.send("Hello! How are you?")
// Stream responses
for await message in conversation.messages {
print("\(message.role): \(message.text)")
}// Start voice mode with automatic turn detection (VAD)
let conversation = try await echo.startConversation(mode: .audio)
// The conversation handles audio I/O automatically
// User speaks → AI responds → User speaks...
// VAD automatically detects when you stop speaking
// Switch to text anytime
try await conversation.switchMode(to: .text)Generate embeddings for semantic search, similarity matching, and more!
// Generate a single embedding
let embedding = try await echo.generate.embedding(
from: "Swift is a powerful programming language"
)
// Returns [Float] with 1536 dimensions (default)// Process multiple texts at once
let embeddings = try await echo.generate.embeddings(
from: ["Document 1", "Document 2", "Document 3"],
model: .textEmbedding3Large // 3072 dimensions
)// Find semantically similar texts from a corpus
let corpus = [
"The quick brown fox jumps over the lazy dog",
"A fast auburn canine leaps above a sleepy hound",
"Python is a programming language",
"Swift is a modern programming language"
]
let results = try await echo.find.similar(
to: "Tell me about Swift programming",
in: corpus,
topK: 2
)
// Results sorted by similarity
for result in results {
print("\(result.text) - Similarity: \(result.similarity)")
}
// Output:
// "Swift is a modern programming language" - Similarity: 0.825
// "Python is a programming language" - Similarity: 0.743// Use custom dimensions for specific models
let embedding = try await echo.generate.embedding(
from: "Optimize for size",
model: .textEmbedding3Small,
dimensions: 512 // Reduce from 1536 to 512
)textEmbedding3Small- 1536 dimensions (default, best balance)textEmbedding3Large- 3072 dimensions (highest accuracy)textEmbeddingAda002- 1536 dimensions (legacy)
Generate type-safe JSON responses that conform to your schemas!
// Request JSON formatted response
let jsonResponse = try await conversation.send("Generate a user profile for Alice, age 30")
// Returns valid JSON string// Define your schema with Codable
struct UserProfile: Codable, Sendable {
let name: String
let age: Int
let email: String
let interests: [String]
}
// Generate structured data - type-safe and validated!
let profile = try await echo.generate.structured(
UserProfile.self,
from: "Create a profile for Bob Smith, 28, bob@example.com, likes Swift and AI"
)
print(profile.name) // "Bob Smith"
print(profile.age) // 28
print(profile.interests) // ["Swift", "AI"]struct TodoList: Codable, Sendable {
struct TodoItem: Codable, Sendable {
let id: String
let title: String
let completed: Bool
let priority: Priority
enum Priority: String, Codable {
case low, medium, high
}
}
let title: String
let items: [TodoItem]
let createdAt: Date
}
// Generate complex nested structures
let todoList = try await echo.generate.structured(
TodoList.self,
from: "Create a todo list for launching a new app with 3 tasks"
)Switch seamlessly between voice and text while preserving context:
// Start in text mode
let conversation = try await echo.startConversation(mode: .text)
try await conversation.send("Let's discuss Swift")
// Switch to voice - context preserved!
try await conversation.switchMode(to: .audio)
// Continue conversation with voice...
// Switch back to text anytime
try await conversation.switchMode(to: .text)
// Previous context still availableRegister functions that the AI can call:
// Define a tool
let weatherTool = Tool(
name: "get_weather",
description: "Get current weather for a location",
parameters: [
"location": ["type": "string", "description": "City name"]
]
) { args in
let location = args["location"] as? String ?? "Unknown"
return "It's 72°F and sunny in \(location)"
}
// Register the tool
echo.registerTool(weatherTool)
// AI will automatically call tools when needed
try await conversation.send("What's the weather in San Francisco?")
// AI calls get_weather("San Francisco") and responds with the resultMonitor all events with the intuitive when syntax:
// Listen for specific events
echo.when(.messageFinalized) { event in
if case .messageFinalized(let message) = event {
print("New message: \(message.text)")
}
}
echo.when(.userStartedSpeaking) { _ in
print("🎙️ User is speaking...")
}
echo.when(.assistantStartedSpeaking) { _ in
print("🤖 Assistant is responding...")
}
echo.when(.userTranscriptionCompleted) { event in
if case .userTranscriptionCompleted(let transcript) = event {
print("User said: \(transcript)")
}
}Customize behavior with configuration:
let configuration = EchoConfiguration(
realtimeModel: .gptRealtimeMini, // For voice
responsesModel: .gpt5, // For text
temperature: 0.7,
maxTokens: 2000,
voice: .alloy, // Voice selection
audioFormat: .pcm16, // Audio format
turnDetection: .automatic( // Voice activity detection
VADConfiguration(
threshold: 0.5,
silenceDuration: .milliseconds(500)
)
)
)
let echo = Echo(key: apiKey, configuration: configuration)Configure how voice conversations detect when users stop speaking:
// Automatic (VAD) - Recommended
// AI automatically responds when it detects silence
let vadConfig = VADConfiguration(
threshold: 0.5,
silenceDuration: .milliseconds(500)
)
configuration.turnDetection = .automatic(vadConfig)
// Manual - You control when turns end
// Call conversation.endUserTurn() to trigger response
configuration.turnDetection = .manual
// Disabled - No turn management
configuration.turnDetection = .disabled// Build a simple semantic search
class DocumentSearch {
let echo: Echo
var embeddings: [(text: String, vector: [Float])] = []
// Index documents
func indexDocuments(_ documents: [String]) async throws {
let vectors = try await echo.generate.embeddings(from: documents)
embeddings = zip(documents, vectors).map { ($0, $1) }
}
// Search
func search(_ query: String, topK: Int = 5) async throws -> [String] {
let queryEmbedding = try await echo.generate.embedding(from: query)
// Calculate similarities
let results = embeddings.map { doc in
let similarity = cosineSimilarity(queryEmbedding, doc.vector)
return (doc.text, similarity)
}
.sorted { $0.1 > $1.1 }
.prefix(topK)
return results.map { $0.0 }
}
}struct BlogPost: Codable, Sendable {
let title: String
let introduction: String
let mainPoints: [String]
let conclusion: String
let tags: [String]
}
let post = try await echo.generate.structured(
BlogPost.self,
from: "Write a blog post about the future of AI in mobile development",
instructions: "Make it technical but accessible, around 500 words"
)
print("Title: \(post.title)")
print("Tags: \(post.tags.joined(separator: ", "))")- iOS 18.0+ / macOS 14.0+
- Swift 6.0+
- Xcode 16.0+
For detailed documentation, see the Architecture Specification.
Contributions are welcome! Please feel free to submit a Pull Request.
Echo is available under the MIT license. See the LICENSE file for more info.
Questions? Open an issue or reach out!
Enjoying Echo? Give it a ⭐️