A modular Swift SDK for audio processing with MLX on Apple Silicon
MLXAudio follows a modular design allowing you to import only what you need:
- MLXAudioCore: Base types, protocols, and utilities
- MLXAudioCodecs: Audio codec implementations (SNAC, Vocos, Mimi)
- MLXAudioTTS: Text-to-Speech models (Soprano, Qwen3, LlamaTTS)
- MLXAudioSTT: Speech-to-Text models (GLMASR, Whisper)
- MLXAudioSTS: Speech-to-Speech (future)
- MLXAudioUI: SwiftUI components for audio interfaces
Add MLXAudio to your project using Swift Package Manager:
dependencies: [
.package(url: "https://github.com/Blaizzy/mlx-audio-swift.git", branch: "main")
]
// Import only what you need
.product(name: "MLXAudioTTS", package: "mlx-audio-swift"),
.product(name: "MLXAudioCore", package: "mlx-audio-swift")import MLXAudioTTS
import MLXAudioCore
// Load a TTS model from HuggingFace
let model = try await SopranoModel.fromPretrained("mlx-community/Soprano-80M-bf16")
// Generate audio
let audio = try await model.generate(
text: "Hello from MLX Audio Swift!",
parameters: GenerateParameters(
maxTokens: 200,
temperature: 0.7,
topP: 0.95
)
)
// Save to file
try saveAudioArray(audio, sampleRate: Double(model.sampleRate), to: outputURL)import MLXAudioSTT
import MLXAudioCore
// Load audio file
let (sampleRate, audioData) = try loadAudioArray(from: audioURL)
// Load STT model
let model = try await GLMASRModel.fromPretrained("mlx-community/GLM-ASR-Nano-2512-4bit")
// Transcribe
let output = model.generate(audio: audioData)
print(output.text)for try await event in model.generateStream(text: text, parameters: parameters) {
switch event {
case .token(let token):
print("Generated token: \(token)")
case .audio(let audio):
print("Final audio shape: \(audio.shape)")
case .info(let info):
print(info.summary)
}
}| Model | Type | HuggingFace Repo |
|---|---|---|
| Soprano | TTS | mlx-community/Soprano-80M-bf16 |
| Qwen3 | TTS | mlx-community/VyvoTTS-EN-Beta-4bit |
| LlamaTTS (Orpheus) | TTS | mlx-community/orpheus-3b-0.1-ft-bf16 |
| GLMASR | STT | mlx-community/GLM-ASR-Nano-2512-4bit |
- Modular architecture for minimal app size - import only what you need
- Automatic model downloading from HuggingFace Hub
- Native async/await support for seamless Swift integration
- Streaming audio generation for real-time TTS
- Type-safe Swift API with comprehensive error handling
- Optimized for Apple Silicon with MLX framework
let parameters = GenerateParameters(
maxTokens: 1200,
temperature: 0.7,
topP: 0.95,
repetitionPenalty: 1.5,
repetitionContextSize: 30
)
let audio = try await model.generate(text: "Your text here", parameters: parameters)import MLXAudioCodecs
// Load SNAC codec
let snac = try await SNAC.fromPretrained("mlx-community/snac_24khz")
// Encode audio to tokens
let tokens = try snac.encode(audio)
// Decode tokens back to audio
let reconstructed = try snac.decode(tokens)// For models supporting multiple voices (like LlamaTTS/Orpheus)
let audio = try await model.generate(
text: "Hello!",
voice: "tara", // Options: tara, leah, jess, leo, dan, mia, zac, zoe
parameters: parameters
)- macOS 14+ or iOS 17+
- Apple Silicon (M1 or later) recommended for optimal performance
- Xcode 15+
- Swift 5.9+
Check out the Examples/VoicesApp directory for a complete SwiftUI application demonstrating:
- Loading and running TTS models
- Playing generated audio
- UI components for model interaction
Additional usage examples can be found in the test files.
- Built on MLX Swift
- Uses swift-transformers
- Inspired by MLX Audio (Python)
MIT License - see LICENSE file for details.