Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
18 changes: 2 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,6 @@
![](https://github.com/sipsorcery-org/sipsorcery/actions/workflows/sipsorcery-core-mac.yml/badge.svg)
![](https://github.com/sipsorcery-org/sipsorcery/actions/workflows/examples-core-win.yml/badge.svg)

## New WebRTC Demos - Jan 2025

**Connect to OpenAI's Realtime WebRTC Endpoint**

The [WebRTCOpenAI](https://github.com/sipsorcery-org/sipsorcery/blob/master/examples/WebRTCExamples/WebRTCOpenAI/Program.cs) demonstrates a dotnet only (no native libraries) applicaiton that connects to [OpenAI's new WebRTC Realtime](https://platform.openai.com/docs/guides/realtime-webrtc) endpoint. This demo lets you talk in realtime to ChatGPT and receive both a WebRTC audio stream response and a text transcript. Could video avatars be on the way?! A real Max Headroom!

![ChatGPT WebRTC Transcript](./img/openai.png)

**Use WebRTC + OpenGL for an Audio Scope**

The [WebRTCOpenGL](https://github.com/sipsorcery-org/sipsorcery/blob/master/examples/WebRTCExamples/WebRTCOpenGL/Program.cs) demonstrates a way to combine digital signal processing of a WebRTC audio stream and then use OpenGL to render a video stream representation of it. It looks way better than it sounds. Try it out!

![AudioScope](./img/audio-scope.png)

## What Is It?

**This fully C# library can be used to add Real-time Communications, typically audio and video calls, to .NET applications.**
Expand All @@ -41,9 +27,9 @@ The diagram below is a high level overview of a Real-time audio and video call b
- [SIPSorceryMedia.Windows](https://github.com/sipsorcery-org/SIPSorceryMedia.Windows): An example of a Windows specific library that provides audio capture and playback.
- [SIPSorceryMedia.Encoders](https://github.com/sipsorcery-org/SIPSorceryMedia.Encoders): An example of a Windows specific wrapper for the [VP8](https://www.webmproject.org/) video codec.
- [SIPSorceryMedia.FFmpeg](https://github.com/sipsorcery-org/SIPSorceryMedia.FFmpeg): An example of a cross platform library that features audio and video codecs using PInvoke and [FFmpeg](https://ffmpeg.org/).
- Others: **Contributions welcome**. Frequently requested are Xamarin Forms on Android/iOS and Unix (Linux and/or Mac). New implementations need to implement one or more of the Audio Sink/Source and/or Video Sink/Source interfaces from [SIPSorceryMedia.Abstractions](https://github.com/sipsorcery-org/SIPSorceryMedia.Abstractions/blob/master/src/MediaEndPoints.cs).
- [SIPSorceryMedia.SDL2](https://github.com/sipsorcery-org/SIPSorceryMedia.SDL2): An example of integrating the cross-platform [SDL2](https://www.libsdl.org/index.php) Simple Direct Media Layer library.

- This library provides only a small number of audio and video codecs (G711 and G722). Additional codecs, particularly video ones, require C or C++ libraries. An effort is underway to port the [VP8](https://www.webmproject.org/) video codec to C# see [VP8.Net](https://github.com/sipsorcery-org/VP8.Net).
- This library provides only a small number of audio and video codecs (G711, G722 and G729). OPUS is available via [Concentus](https://github.com/lostromb/concentus). Additional codecs, particularly video ones, require C or C++ libraries. An effort is underway to port the [VP8](https://www.webmproject.org/) video codec to C# see [VP8.Net](https://github.com/sipsorcery-org/VP8.Net).

## Installation

Expand Down
93 changes: 50 additions & 43 deletions examples/AzureExamples/TextToPcm/Program.cs
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
// Description: An example program to send text-to-speech requests to the
// Azure Cognitive Speech Services API and save the results to PCM files.
//
// Update 4 Feb 2025: You need an "Azure AI services | Speech service" resource
// for this demo. Once the resource is set up, you can get the subscription key
// and region from the Overiew page on the Azure portal.
//
// References:
// https://learn.microsoft.com/en-us/azure/ai-services/speech-service/overview
//
// Notes:
// The audio format returned by the Azure Speech Service is:
// 16 bit signed PCM at 16KHz
Expand All @@ -20,6 +27,7 @@
//
// History:
// 03 Jun 2020 Aaron Clauson Created, Dublin, Ireland.
// 04 Feb 2025 Aaron Clauson Checked still working with net9.0.
//
// License:
// BSD 3-Clause "New" or "Revised" License, see included LICENSE.md file.
Expand All @@ -31,65 +39,64 @@
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace demo
namespace demo;

class Program
{
class Program
static async Task Main(string[] args)
{
static async Task Main(string[] args)
{
Console.WriteLine("Text to PCM Console:");
Console.WriteLine("Text to PCM Console:");

if (args == null || args.Length < 3)
{
Console.WriteLine("Usage: texttopcm <azure subscription key> <azure region> <text>");
Console.WriteLine("e.g. texttopcm cb965... westeurope \"Hello World\"");
Console.WriteLine("e.g. dotnet run -- cb965... westeurope \"Hello World\"");
}
else
{
var speechConfig = SpeechConfig.FromSubscription(args[0], args[1]);
if (args == null || args.Length < 3)
{
Console.WriteLine("Usage: texttopcm <azure subscription key> <azure region> <text>");
Console.WriteLine("e.g. texttopcm cb965... westeurope \"Hello World\"");
Console.WriteLine("e.g. dotnet run -- cb965... westeurope \"Hello World\"");
}
else
{
var speechConfig = SpeechConfig.FromSubscription(args[0], args[1]);

string text = args[2];
string text = args[2];

TextToSpeechStream ttsOutStream = new TextToSpeechStream();
AudioConfig audioTtsConfig = AudioConfig.FromStreamOutput(ttsOutStream);
SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig, audioTtsConfig);
TextToSpeechStream ttsOutStream = new TextToSpeechStream();
AudioConfig audioTtsConfig = AudioConfig.FromStreamOutput(ttsOutStream);
SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig, audioTtsConfig);

using (var result = await speechSynthesizer.SpeakTextAsync(text))
using (var result = await speechSynthesizer.SpeakTextAsync(text))
{
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
Console.WriteLine($"Speech synthesized to speaker for text [{text}]");

var buffer = ttsOutStream.GetPcmBuffer();
string saveFilename = DateTime.Now.Ticks.ToString() + ".pcm16k";
var buffer = ttsOutStream.GetPcmBuffer();
string saveFilename = DateTime.Now.Ticks.ToString() + ".pcm16k";

using (StreamWriter sw = new StreamWriter(saveFilename))
{
for(int i=0; i<buffer.Length; i++)
{
sw.BaseStream.Write(BitConverter.GetBytes(buffer[i]));
}
}

Console.WriteLine($"Result saved to {saveFilename}.");
}
else if (result.Reason == ResultReason.Canceled)
using (StreamWriter sw = new StreamWriter(saveFilename))
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"Speech synthesizer failed was cancelled, reason={cancellation.Reason}");

if (cancellation.Reason == CancellationReason.Error)
for(int i=0; i<buffer.Length; i++)
{
Console.WriteLine($"Speech synthesizer cancelled: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"Speech synthesizer cancelled: ErrorDetails=[{cancellation.ErrorDetails}]");
sw.BaseStream.Write(BitConverter.GetBytes(buffer[i]));
}
}
else

Console.WriteLine($"Result saved to {saveFilename}.");
}
else if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"Speech synthesizer failed was cancelled, reason={cancellation.Reason}");

if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"Speech synthesizer failed with result {result.Reason} for text [{text}].");
Console.WriteLine($"Speech synthesizer cancelled: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"Speech synthesizer cancelled: ErrorDetails=[{cancellation.ErrorDetails}]");
}
}
else
{
Console.WriteLine($"Speech synthesizer failed with result {result.Reason} for text [{text}].");
}
}
}
}
Expand Down
12 changes: 6 additions & 6 deletions examples/AzureExamples/TextToPcm/TextToPcm.csproj
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.1</TargetFramework>
<TargetFramework>net9.0</TargetFramework>
<LangVersion>12</LangVersion>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.Extensions.Logging" Version="3.1.3" />
<PackageReference Include="Serilog.Extensions.Logging" Version="3.0.1" />
<PackageReference Include="Serilog.Sinks.Console" Version="3.1.1" />
<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.11.0" />
<PackageReference Include="SIPSorcery" Version="4.0.47-pre" />
<PackageReference Include="Microsoft.Extensions.Logging" Version="9.0.1" />
<PackageReference Include="Serilog.Extensions.Logging" Version="9.0.0" />
<PackageReference Include="Serilog.Sinks.Console" Version="6.0.0" />
<PackageReference Include="Microsoft.CognitiveServices.Speech" Version="1.42.0" />
</ItemGroup>

</Project>
125 changes: 62 additions & 63 deletions examples/AzureExamples/TextToPcm/TextToSpeechStream.cs
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -19,82 +19,81 @@
using System.IO;
using Microsoft.CognitiveServices.Speech.Audio;

namespace demo
namespace demo;

/// <summary>
/// This is the backing class for the Azure text-to-speech service call. It will
/// have the results of the text-to-speech request pushed into its stream buffer which
/// can be retrieved for subsequent operations such as sending via RTP.
/// </summary>
public class TextToSpeechStream : PushAudioOutputStreamCallback
{
public MemoryStream _ms = new MemoryStream();
private int _posn = 0;

public TextToSpeechStream()
{ }

/// <summary>
/// This is the backing class for the Azure text-to-speech service call. It will
/// have the results of the text-to-speech request pushed into its stream buffer which
/// can be retrieved for subsequent operations such as sending via RTP.
/// This gets called by the internals of the Azure text-to-speech SDK to write the resultant
/// PCM 16Khz 16 bit audio samples.
/// </summary>
public class TextToSpeechStream : PushAudioOutputStreamCallback
/// <param name="dataBuffer">The data buffer containing the audio sample.</param>
/// <returns>The number of bytes written from the supplied sample.</returns>
public override uint Write(byte[] dataBuffer)
{
public MemoryStream _ms = new MemoryStream();
private int _posn = 0;
//Console.WriteLine($"TextToSpeechAudioOutStream bytes written to output stream {dataBuffer.Length}.");

public TextToSpeechStream()
{ }
_ms.Write(dataBuffer, 0, dataBuffer.Length);
_posn = _posn + dataBuffer.Length;

/// <summary>
/// This gets called by the internals of the Azure text-to-speech SDK to write the resultant
/// PCM 16Khz 16 bit audio samples.
/// </summary>
/// <param name="dataBuffer">The data buffer containing the audio sample.</param>
/// <returns>The number of bytes written from the supplied sample.</returns>
public override uint Write(byte[] dataBuffer)
{
//Console.WriteLine($"TextToSpeechAudioOutStream bytes written to output stream {dataBuffer.Length}.");
return (uint)dataBuffer.Length;
}

_ms.Write(dataBuffer, 0, dataBuffer.Length);
_posn = _posn + dataBuffer.Length;
/// <summary>
/// Closes the stream.
/// </summary>
public override void Close()
{
_ms.Close();
base.Close();
}

return (uint)dataBuffer.Length;
}
/// <summary>
/// Get the current contents of the memory stream as a buffer of PCM samples.
/// The PCM samples are suitable to be fed into an audio codec as part of the
/// RTP send.
/// </summary>
public short[] GetPcmBuffer()
{
_ms.Position = 0;
byte[] buffer = _ms.GetBuffer();
short[] pcmBuffer = new short[_posn / 2];

/// <summary>
/// Closes the stream.
/// </summary>
public override void Close()
for (int i = 0; i < pcmBuffer.Length; i++)
{
_ms.Close();
base.Close();
pcmBuffer[i] = BitConverter.ToInt16(buffer, i * 2);
}

/// <summary>
/// Get the current contents of the memory stream as a buffer of PCM samples.
/// The PCM samples are suitable to be fed into an audio codec as part of the
/// RTP send.
/// </summary>
public short[] GetPcmBuffer()
{
_ms.Position = 0;
byte[] buffer = _ms.GetBuffer();
short[] pcmBuffer = new short[_posn / 2];

for (int i = 0; i < pcmBuffer.Length; i++)
{
pcmBuffer[i] = BitConverter.ToInt16(buffer, i * 2);
}

return pcmBuffer;
}
return pcmBuffer;
}

/// <summary>
/// Clear is intended to be called after the method to get the PCM buffer.
/// It will reset the underlying memory buffer ready for the next text-to-speech operation.
/// </summary>
public void Clear()
{
_ms.SetLength(0);
_posn = 0;
}
/// <summary>
/// Clear is intended to be called after the method to get the PCM buffer.
/// It will reset the underlying memory buffer ready for the next text-to-speech operation.
/// </summary>
public void Clear()
{
_ms.SetLength(0);
_posn = 0;
}

/// <summary>
/// Used to check if there is data waiting to be copied.
/// </summary>
/// <returns>True if the stream is empty. False if there is some data available.</returns>
public bool IsEmpty()
{
return _posn == 0;
}
/// <summary>
/// Used to check if there is data waiting to be copied.
/// </summary>
/// <returns>True if the stream is empty. False if there is some data available.</returns>
public bool IsEmpty()
{
return _posn == 0;
}
}
41 changes: 41 additions & 0 deletions examples/FSharpExamples/FSharpExamples.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 17
VisualStudioVersion = 17.12.35527.113
MinimumVisualStudioVersion = 10.0.40219.1
Project("{F2A71F9B-5D33-465A-A702-920D77279786}") = "WebRTCOpenAI", "WebRTCOpenAI\WebRTCOpenAI.fsproj", "{4D11ABA6-09D2-4A6F-A69F-E49B59697460}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "OpenAI.Realtime", "..\OpenAIExamples\OpenAI.Realtime\OpenAI.Realtime.csproj", "{D12DE229-454C-4D98-B2C8-A6AE642E1414}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SIPSorcery", "..\..\src\SIPSorcery.csproj", "{7577DFB6-351B-4E70-986D-D9247218F990}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
Unsigned|Any CPU = Unsigned|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Debug|Any CPU.Build.0 = Debug|Any CPU
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Release|Any CPU.ActiveCfg = Release|Any CPU
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Release|Any CPU.Build.0 = Release|Any CPU
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Unsigned|Any CPU.ActiveCfg = Debug|Any CPU
{4D11ABA6-09D2-4A6F-A69F-E49B59697460}.Unsigned|Any CPU.Build.0 = Debug|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Debug|Any CPU.Build.0 = Debug|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Release|Any CPU.ActiveCfg = Release|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Release|Any CPU.Build.0 = Release|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Unsigned|Any CPU.ActiveCfg = Debug|Any CPU
{D12DE229-454C-4D98-B2C8-A6AE642E1414}.Unsigned|Any CPU.Build.0 = Debug|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Debug|Any CPU.Build.0 = Debug|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Release|Any CPU.ActiveCfg = Release|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Release|Any CPU.Build.0 = Release|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Unsigned|Any CPU.ActiveCfg = Debug|Any CPU
{7577DFB6-351B-4E70-986D-D9247218F990}.Unsigned|Any CPU.Build.0 = Debug|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal
Loading
Loading