More TTS providers (mastra-ai#1589)

Inspired by Orate, add more TTS providers
kustomzone · Jan 28, 2025 · 6caa4b3 · 6caa4b3
1 parent 164b420
commit 6caa4b3
Show file tree

Hide file tree

Showing 82 changed files with 6,121 additions and 57 deletions.
diff --git a/.changeset/smart-ligers-dream.md b/.changeset/smart-ligers-dream.md
@@ -0,0 +1,14 @@
+---
+'@mastra/speech-elevenlabs': patch
+'@mastra/speech-replicate': patch
+'@mastra/speech-speechify': patch
+'@mastra/speech-deepgram': patch
+'@mastra/speech-google': patch
+'@mastra/speech-openai': patch
+'@mastra/speech-playai': patch
+'@mastra/speech-azure': patch
+'@mastra/speech-murf': patch
+'@mastra/speech-ibm': patch
+---
+
+Speech modules for TTS providers
diff --git a/docs/src/pages/docs/reference/tts/generate.mdx b/docs/src/pages/docs/reference/tts/generate.mdx
@@ -81,7 +81,7 @@ const outputPath = path.join(process.cwd(), 'test-outputs/open-aigenerate-test.m
 writeFileSync(outputPath, audioResult);
 ```
 
-### Basic Audio Generation (OpenAI)
+### Basic Audio Generation (PlayAI)
 
 ```typescript
 import { PlayAITTS } from '@mastra/tts'
@@ -103,6 +103,89 @@ const outputPath = path.join(process.cwd(), 'test-outputs/open-aigenerate-test.m
 writeFileSync(outputPath, audioResult);
 ```
 
+### Azure Generation
+
+```typescript
+import { AzureTTS } from '@mastra/tts'
+
+const tts = new AzureTTS({
+  model: {
+    name: 'en-US-JennyNeural',
+    apiKey: process.env.AZURE_API_KEY,
+    region: process.env.AZURE_REGION,
+  },
+});
+
+const { audioResult } = await tts.generate({ text: "What is AI?" });
+await writeFile(path.join(process.cwd(), '/test-outputs/azure-output.mp3'), audioResult);
+```
+
+### Deepgram Generation
+
+```typescript
+import { DeepgramTTS } from '@mastra/tts'
+
+const tts = new DeepgramTTS({
+  model: {
+    name: 'aura',
+    voice: 'asteria-en',
+    apiKey: process.env.DEEPGRAM_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.generate({ text: "What is AI?" });
+await writeFile(path.join(process.cwd(), '/test-outputs/deepgram-output.mp3'), audioResult);
+```
+
+### Google Generation
+
+```typescript
+import { GoogleTTS } from '@mastra/tts'
+
+const tts = new GoogleTTS({
+  model: {
+    name: 'en-US-Standard-A',
+    credentials: process.env.GOOGLE_CREDENTIALS,
+  },
+});
+
+const { audioResult } = await tts.generate({ text: "What is AI?" });
+await writeFile(path.join(process.cwd(), '/test-outputs/google-output.mp3'), audioResult);
+```
+
+### IBM Generation
+
+```typescript
+import { IbmTTS } from '@mastra/tts'
+
+const tts = new IbmTTS({
+  model: {
+    voice: 'en-US_AllisonV3Voice',
+    apiKey: process.env.IBM_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.generate({ text: "What is AI?" });
+await writeFile(path.join(process.cwd(), '/test-outputs/ibm-output.mp3'), audioResult);
+```
+
+### Murf Generation
+
+```typescript
+import { MurfTTS } from '@mastra/tts'
+
+const tts = new MurfTTS({
+  model: {
+    name: 'GEN2',
+    voice: 'en-US-natalie',
+    apiKey: process.env.MURF_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.generate({ text: "What is AI?" });
+await writeFile(path.join(process.cwd(), '/test-outputs/murf-output.mp3'), audioResult);
+```
+
 ## Related Methods
 
 For streaming audio responses, see the [`stream()`](./stream.mdx) method documentation.
diff --git a/docs/src/pages/docs/reference/tts/providers-and-models.mdx b/docs/src/pages/docs/reference/tts/providers-and-models.mdx
@@ -11,4 +11,96 @@ description: Overview of supported TTS providers and their models.
 | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | ElevenLabs    | `eleven_multilingual_v2`, `eleven_flash_v2_5`, `eleven_flash_v2`, `eleven_multilingual_sts_v2`, `eleven_english_sts_v2`                                                  |
 | OpenAI        | `tts-1`, `tts-1-hd`                                                                                                                                                      |                                                                                                  
-| PlayAI        | `PlayDialog`, `Play3.0-mini`                                                                                                                                                             |
+| PlayAI        | `PlayDialog`, `Play3.0-mini`                                                                                                                                             |
+| Azure         | Various voices available through Azure Cognitive Services                                                                                                                |
+| Deepgram      | `aura` and other models with voice options like `asteria-en`                                                                                                            |
+| Google        | Various voices through Google Cloud Text-to-Speech                                                                                                                       |
+| IBM           | Various voices including `en-US_AllisonV3Voice`                                                                                                                          |
+| Murf          | `GEN1`, `GEN2` with various voices like `en-US-natalie`                                                                                                                 |
+
+## Configuration
+
+Each provider requires specific configuration. Here are examples for each provider:
+
+### ElevenLabs Configuration
+```typescript
+const tts = new ElevenLabsTTS({
+  model: {
+    name: 'eleven_multilingual_v2',
+    apiKey: process.env.ELEVENLABS_API_KEY,
+  },
+});
+```
+
+### OpenAI Configuration
+```typescript
+const tts = new OpenAITTS({
+  model: {
+    name: 'tts-1',  // or 'tts-1-hd' for higher quality
+    apiKey: process.env.OPENAI_API_KEY,
+  },
+});
+```
+
+### PlayAI Configuration
+```typescript
+const tts = new PlayAITTS({
+  model: {
+    name: 'PlayDialog',  // or 'Play3.0-mini'
+    apiKey: process.env.PLAYAI_API_KEY,
+  },
+  userId: process.env.PLAYAI_USER_ID,
+});
+```
+
+### Azure Configuration
+```typescript
+const tts = new AzureTTS({
+  model: {
+    name: 'en-US-JennyNeural',
+    apiKey: process.env.AZURE_API_KEY,
+    region: process.env.AZURE_REGION,
+  },
+});
+```
+
+### Deepgram Configuration
+```typescript
+const tts = new DeepgramTTS({
+  model: {
+    name: 'aura',
+    voice: 'asteria-en',
+    apiKey: process.env.DEEPGRAM_API_KEY,
+  },
+});
+```
+
+### Google Configuration
+```typescript
+const tts = new GoogleTTS({
+  model: {
+    name: 'en-US-Standard-A',
+    credentials: process.env.GOOGLE_CREDENTIALS,
+  },
+});
+```
+
+### IBM Configuration
+```typescript
+const tts = new IbmTTS({
+  model: {
+    voice: 'en-US_AllisonV3Voice',
+    apiKey: process.env.IBM_API_KEY,
+  },
+});
+```
+
+### Murf Configuration
+```typescript
+const tts = new MurfTTS({
+  model: {
+    name: 'GEN2',
+    voice: 'en-US-natalie',
+    apiKey: process.env.MURF_API_KEY,
+  },
+});
diff --git a/docs/src/pages/docs/reference/tts/stream.mdx b/docs/src/pages/docs/reference/tts/stream.mdx
@@ -40,7 +40,7 @@ The `stream()` method is used to interact with the TTS model to produce an audio
 
 ## Examples
 
-### Basic Audio Stream (ElevenLabs)
+### ElevenLabs Streaming
 
 ```typescript
 import { ElevenLabsTTS } from '@mastra/tts'
@@ -83,7 +83,7 @@ for await (const chunk of audioResult) {
 writeStream.end()
 ```
 
-### Basic Audio Stream (OpenAI)
+### OpenAI Streaming
 
 ```typescript
 import { OpenAITTS } from '@mastra/tts'
@@ -126,7 +126,7 @@ for await (const chunk of audioResult) {
 writeStream.end()
 ```
 
-### Basic Audio Stream (PlayAI)
+### PlayAI Streaming
 
 ```typescript
 import { PlayAITTS } from '@mastra/tts'
@@ -168,4 +168,117 @@ for await (const chunk of audioResult) {
 }
 
 writeStream.end()
+```
+
+### Azure Streaming
+
+```typescript
+import { AzureTTS } from '@mastra/tts'
+
+const tts = new AzureTTS({
+  model: {
+    name: 'en-US-JennyNeural',
+    apiKey: process.env.AZURE_API_KEY,
+    region: process.env.AZURE_REGION,
+  },
+});
+
+const { audioResult } = await tts.stream({ text: "What is AI?" });
+
+// Create a write stream
+const outputPath = path.join(process.cwd(), '/test-outputs/azure-stream.mp3');
+const writeStream = createWriteStream(outputPath);
+
+// Pipe the audio stream to the file
+audioResult.pipe(writeStream);
+```
+
+### Deepgram Streaming
+
+```typescript
+import { DeepgramTTS } from '@mastra/tts'
+
+const tts = new DeepgramTTS({
+  model: {
+    name: 'aura',
+    voice: 'asteria-en',
+    apiKey: process.env.DEEPGRAM_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.stream({ text: "What is AI?" });
+
+// Create a write stream
+const outputPath = path.join(process.cwd(), '/test-outputs/deepgram-stream.mp3');
+const writeStream = createWriteStream(outputPath);
+
+// Pipe the audio stream to the file
+audioResult.pipe(writeStream);
+```
+
+### Google Streaming
+
+```typescript
+import { GoogleTTS } from '@mastra/tts'
+
+const tts = new GoogleTTS({
+  model: {
+    name: 'en-US-Standard-A',
+    credentials: process.env.GOOGLE_CREDENTIALS,
+  },
+});
+
+const { audioResult } = await tts.stream({ text: "What is AI?" });
+
+// Create a write stream
+const outputPath = path.join(process.cwd(), '/test-outputs/google-stream.mp3');
+const writeStream = createWriteStream(outputPath);
+
+// Pipe the audio stream to the file
+audioResult.pipe(writeStream);
+```
+
+### IBM Streaming
+
+```typescript
+import { IbmTTS } from '@mastra/tts'
+
+const tts = new IbmTTS({
+  model: {
+    voice: 'en-US_AllisonV3Voice',
+    apiKey: process.env.IBM_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.stream({ text: "What is AI?" });
+
+// Create a write stream
+const outputPath = path.join(process.cwd(), '/test-outputs/ibm-stream.mp3');
+const writeStream = createWriteStream(outputPath);
+
+// Pipe the audio stream to the file
+audioResult.pipe(writeStream);
+```
+
+### Murf Streaming
+
+```typescript
+import { MurfTTS } from '@mastra/tts'
+
+const tts = new MurfTTS({
+  model: {
+    name: 'GEN2',
+    voice: 'en-US-natalie',
+    apiKey: process.env.MURF_API_KEY,
+  },
+});
+
+const { audioResult } = await tts.stream({ text: "What is AI?" });
+
+// Create a write stream
+const outputPath = path.join(process.cwd(), '/test-outputs/murf-stream.mp3');
+const writeStream = createWriteStream(outputPath);
+
+// Pipe the audio stream to the file
+audioResult.pipe(writeStream);
 ```
diff --git a/examples/dane/src/mastra/agents/package-publisher.ts b/examples/dane/src/mastra/agents/package-publisher.ts
@@ -30,7 +30,11 @@ const packages_llm_text = `
   - Format: @mastra/vector-{name} -> vector-stores/{name}
   - Special case: @mastra/vector-astra -> vector-stores/astra
 
-  ## 4. Integrations - STRICT RULES:
+  ## 4. Speech packages - STRICT RULES:
+  - ALL speech packages must be directly under speech/
+  - Format: @mastra/speech-{name} -> speech/{name}
+
+  ## 5. Integrations - STRICT RULES:
   - ALL integration packages are under integrations/
   @mastra/apollos -> integrations/apollo
   @mastra/ashby -> integrations/ashby

diff --git a/package.json b/package.json
@@ -25,7 +25,7 @@
     "build:packages": "pnpm --filter \"./packages/*\" build",
     "build:vector-stores": "pnpm --filter \"./vector-stores/*\" build",
     "build:deployers": "pnpm --filter \"./deployers/*\" build",
-    "build:deployers:dev": "pnpm --filter \"./deployers/*\" build:dev",
+    "build:speech": "pnpm --filter \"./speech/*\" build",
     "build:cli": "pnpm --filter ./packages/cli build",
     "build:deployer": "pnpm --filter ./packages/deployer build",
     "build:core": "pnpm --filter ./packages/core build",