-
Notifications
You must be signed in to change notification settings - Fork 0
Developer guide
You'll need to download the latest release and add PySpeech.dll
as a reference in your project.
Once that's done, you'll want to add it as a BepInDependency
, like so:
[BepInDependency("JS03.PySpeech")]
public class YourPlugin : BaseUnityPlugin {
// ...
}
Alternatively, you can add it as a soft dependency if your mod does not rely on speech recognition to work:
[BepInDependency("JS03.PySpeech", BepInDependency.DependencyFlags.SoftDependency)]
public class YourPlugin : BaseUnityPlugin {
// ...
}
To register phrases, all you have to do is call Speech.RegisterPhrases
:
string[] phrases = new string[] { "cool phrase", "another cool phrase" };
Speech.RegisterPhrases(phrases);
You can call Speech.RegisterPhrases
whenever and wherever you'd like, there's no limitations in that regard.
Creating your own SpeechRecognized
event allows you to execute whatever code you want every time the Whisper model gives output.
Here's a simple example of how to set one up:
Speech.RegisterCustomHandler((obj, recognized) => {
Plugin.logger.LogInfo($"Whisper output: {recognized.Text}");
});
These can be set up anywhere, just like Speech.RegisterPhrases
.
When the Whisper model gives output, if any phrases have been registered the API will do a case insensitive similarity check between all the phrases and the recognized phrase to see which one has the highest similarity score.
To check if any of your phrases was said, you can use Speech.IsAboveThreshold(phrases, similarityThreshold)
, which tells you if the recognized phrase matches your specific set of phrases and if the similarity score is above your desired threshold.
Here's an example using Speech.IsAboveThreshold
:
string [] phrases = new string[] { "test phrase" }
Speech.RegisterPhrases(phrases);
Speech.RegisterCustomHandler((obj, recognized) => {
// Check if your phrases contain the best match and its similarity score is above 0.5f
if (Speech.IsAboveThreshold(phrases, 0.5f)) {
Plugin.logger.LogInfo("Test phrase was said!");
}
});
Thanks to Whisper, PySpeech allows you to capture what the player is saying in real time at all times while playing.
If you're not interested in doing voice commands and would instead like to do real time speech transcription, you just have to create your own SpeechRecognized
event as shown in the Voice Commands section. No need to register phrases or do similarity checks:
Speech.RegisterCustomHandler((obj, recognized) => {
Plugin.logger.LogInfo($"Whisper output: {recognized.Text}");
});
PySpeech offers 3 different Whisper models players can choose from:
- Tiny: small, fast, not very accurate.
- Base: medium, not as fast, more accurate.
- Small: big, slow, very accurate.
If you'd like for the players to use a specific model, make it known in your mod description.
The default is the Tiny model.
Players can also choose the language they want Whisper to recognize between the 100 supported languages, which are listed here.
There is also a Multilingual option that allows for real time language identification.