|
| 1 | +--- |
| 2 | +title: "Listen, Control, Language Detection" |
| 3 | +sidebarTitle: "Live Call Features" |
| 4 | +--- |
| 5 | + |
| 6 | +In this documentation, we will showcase our three new features and how you can use them: |
| 7 | + |
| 8 | +1. **Call Control**: Enables dynamic injection of conversation elements during live calls. |
| 9 | +2. **Call Listen**: Provides real-time audio streaming and processing during the call. |
| 10 | +3. **Automatic Language Detection**: Detect the language in real-time conversation and talk in that particular language. |
| 11 | + |
| 12 | +## Call Control and Call Listen Feature |
| 13 | + |
| 14 | +When you initiate a call with the `/call` endpoint, you will receive a call ID. You can listen to the call directly via the Call Listen feature, and if you want to inject some operations into it, you can use the Call Control functionality. |
| 15 | + |
| 16 | +### Call Control |
| 17 | + |
| 18 | +Call Control allows you to inject conversation elements dynamically during a live call via HTTP POST requests. Currently, we support injecting messages in real-time. More operations will be supported in the future. |
| 19 | + |
| 20 | +To inject a message, send a POST request in this format: |
| 21 | + |
| 22 | +```bash |
| 23 | +curl -X POST https://aws-us-west-2-production3-phone-call-websocket.vapi.ai/{call_id}/control \ |
| 24 | +-H "Content-Type: application/json" \ |
| 25 | +-d '{ |
| 26 | + "type": "say", |
| 27 | + "message": "Welcome to Vapi, this message was injected during the call." |
| 28 | +}' |
| 29 | +``` |
| 30 | + |
| 31 | +### Call Listen |
| 32 | + |
| 33 | +Call Listen enables real-time streaming and processing of audio data using WebSocket connections. Here's an example implementation showcasing how you can receive audio packets and manipulate them based on your needs: |
| 34 | + |
| 35 | +```javascript |
| 36 | +const WebSocket = require('ws'); |
| 37 | +const fs = require('fs'); |
| 38 | + |
| 39 | +let pcmBuffer = Buffer.alloc(0); |
| 40 | +const ws = new WebSocket(`${listenUrl}/listen`); |
| 41 | + |
| 42 | +ws.on('open', () => console.log('WebSocket connection established')); |
| 43 | + |
| 44 | +ws.on('message', (data, isBinary) => { |
| 45 | + if (isBinary) { |
| 46 | + pcmBuffer = Buffer.concat([pcmBuffer, data]); |
| 47 | + console.log(`Received PCM data, buffer size: ${pcmBuffer.length}`); |
| 48 | + } else { |
| 49 | + console.log('Received message:', JSON.parse(data.toString())); |
| 50 | + } |
| 51 | +}); |
| 52 | + |
| 53 | +ws.on('close', () => { |
| 54 | + if (pcmBuffer.length > 0) { |
| 55 | + fs.writeFileSync('audio.pcm', pcmBuffer); |
| 56 | + console.log('Audio data saved to audio.pcm'); |
| 57 | + } |
| 58 | +}); |
| 59 | + |
| 60 | +ws.on('error', (error) => console.error('WebSocket error:', error)); |
| 61 | +``` |
| 62 | + |
| 63 | +## Automatic Language Detection |
| 64 | + |
| 65 | +This feature allows you to automatically switch between languages during a call. It is currently supported only on Deepgram and supports the following languages: |
| 66 | + |
| 67 | +<ul> |
| 68 | + <li>ar: Arabic</li> |
| 69 | + <li>bn: Bengali</li> |
| 70 | + <li>yue: Cantonese</li> |
| 71 | + <li>zh: Chinese</li> |
| 72 | + <li>en: English</li> |
| 73 | + <li>fr: French</li> |
| 74 | + <li>de: German</li> |
| 75 | + <li>hi: Hindi</li> |
| 76 | + <li>it: Italian</li> |
| 77 | + <li>ja: Japanese</li> |
| 78 | + <li>ko: Korean</li> |
| 79 | + <li>pt: Portuguese</li> |
| 80 | + <li>ru: Russian</li> |
| 81 | + <li>es: Spanish</li> |
| 82 | + <li>th: Thai</li> |
| 83 | + <li>vi: Vietnamese</li> |
| 84 | +</ul> |
| 85 | + |
| 86 | +To enable automatic language detection for multilingual calls, set `transcriber.languageDetectionEnabled: true` through the `/assistant` API endpoint or use the assistantOverride. |
| 87 | + |
| 88 | +### Requirements for Multilingual Support |
| 89 | + |
| 90 | +To make multilingual support work, you need to choose the following models: |
| 91 | + |
| 92 | +* **Transcriber**: |
| 93 | + * **Deepgram**: `nova-2` or `nova-2-general` |
| 94 | + |
| 95 | +* **Voice Providers**: |
| 96 | + * **11labs**: Multilingual model or Turbo v2.5 |
| 97 | + * **Cartesia**: `sonic-multilingual` model |
| 98 | + |
| 99 | +By using these models and enabling automatic language detection, your application will be able to handle multilingual conversations seamlessly. |
0 commit comments