Skip to content

Blaizzy/mlx-audio-swift

Repository files navigation

MLX Audio Swift

A modular Swift SDK for audio processing with MLX on Apple Silicon

Platform Swift License

Architecture

MLXAudio follows a modular design allowing you to import only what you need:

  • MLXAudioCore: Base types, protocols, and utilities
  • MLXAudioCodecs: Audio codec implementations (SNAC, Vocos, Mimi)
  • MLXAudioTTS: Text-to-Speech models (Soprano, Qwen3, LlamaTTS)
  • MLXAudioSTT: Speech-to-Text models (GLMASR, Whisper)
  • MLXAudioSTS: Speech-to-Speech (future)
  • MLXAudioUI: SwiftUI components for audio interfaces

Installation

Add MLXAudio to your project using Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/Blaizzy/mlx-audio-swift.git", branch: "main")
]

// Import only what you need
.product(name: "MLXAudioTTS", package: "mlx-audio-swift"),
.product(name: "MLXAudioCore", package: "mlx-audio-swift")

Quick Start

Text-to-Speech

import MLXAudioTTS
import MLXAudioCore

// Load a TTS model from HuggingFace
let model = try await SopranoModel.fromPretrained("mlx-community/Soprano-80M-bf16")

// Generate audio
let audio = try await model.generate(
    text: "Hello from MLX Audio Swift!",
    parameters: GenerateParameters(
        maxTokens: 200,
        temperature: 0.7,
        topP: 0.95
    )
)

// Save to file
try saveAudioArray(audio, sampleRate: Double(model.sampleRate), to: outputURL)

Speech-to-Text

import MLXAudioSTT
import MLXAudioCore

// Load audio file
let (sampleRate, audioData) = try loadAudioArray(from: audioURL)

// Load STT model
let model = try await GLMASRModel.fromPretrained("mlx-community/GLM-ASR-Nano-2512-4bit")

// Transcribe
let output = model.generate(audio: audioData)
print(output.text)

Streaming Generation

for try await event in model.generateStream(text: text, parameters: parameters) {
    switch event {
    case .token(let token):
        print("Generated token: \(token)")
    case .audio(let audio):
        print("Final audio shape: \(audio.shape)")
    case .info(let info):
        print(info.summary)
    }
}

Supported Models

Model Type HuggingFace Repo
Soprano TTS mlx-community/Soprano-80M-bf16
Qwen3 TTS mlx-community/VyvoTTS-EN-Beta-4bit
LlamaTTS (Orpheus) TTS mlx-community/orpheus-3b-0.1-ft-bf16
GLMASR STT mlx-community/GLM-ASR-Nano-2512-4bit

Features

  • Modular architecture for minimal app size - import only what you need
  • Automatic model downloading from HuggingFace Hub
  • Native async/await support for seamless Swift integration
  • Streaming audio generation for real-time TTS
  • Type-safe Swift API with comprehensive error handling
  • Optimized for Apple Silicon with MLX framework

Advanced Usage

Custom Generation Parameters

let parameters = GenerateParameters(
    maxTokens: 1200,
    temperature: 0.7,
    topP: 0.95,
    repetitionPenalty: 1.5,
    repetitionContextSize: 30
)

let audio = try await model.generate(text: "Your text here", parameters: parameters)

Audio Codec Usage

import MLXAudioCodecs

// Load SNAC codec
let snac = try await SNAC.fromPretrained("mlx-community/snac_24khz")

// Encode audio to tokens
let tokens = try snac.encode(audio)

// Decode tokens back to audio
let reconstructed = try snac.decode(tokens)

Voice Selection for Multi-Voice Models

// For models supporting multiple voices (like LlamaTTS/Orpheus)
let audio = try await model.generate(
    text: "Hello!",
    voice: "tara",  // Options: tara, leah, jess, leo, dan, mia, zac, zoe
    parameters: parameters
)

Requirements

  • macOS 14+ or iOS 17+
  • Apple Silicon (M1 or later) recommended for optimal performance
  • Xcode 15+
  • Swift 5.9+

Examples

Check out the Examples/VoicesApp directory for a complete SwiftUI application demonstrating:

  • Loading and running TTS models
  • Playing generated audio
  • UI components for model interaction

Additional usage examples can be found in the test files.

Credits

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9

Languages