-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Conversation
Adds a Speech to Text backend that directly interface with the deepspeech streaming API.
c722b3f
to
4e85e3b
Compare
Hi @JPEWdev Thanks for this. However I'm of the opinion that the STT shouldn't be part of mycroft-core like this. I think it's better if a server or STT is setup outside of core and a stream or socket is used to send the data over. This helps with packaging (package size and build-time, possible conflicts between module versions, possibly inflating memory footprint despite not using the module) and keeps mycroft-core more of a "hub" software. What is the main benefits with a direct implementation in core like this compared with a local deepspeech server which is streamed to? This is definitely open for discussion, as stated above this is my opinion and @pendrods, @davidwagnerkc and @Ruthvicp may have other thoughts on the matter. I can see a couple of different approaches as alternatives:
|
Update: The build time is only increased by 50% to ~30 Minutes so that's not too bad and the additional dependencies only increases the package size with only 10 MB so that's not too big of a deal. I however do think it's better to keep deepspeech separate from mycroft-core from an organizational point of view. Again, this is only my opinion and this is no big issue practically since we can keep the models outside of the core package. Edit: The package showed some warnings regarding to libdeepspeech.so so I'm going to do a verify of the install. |
Hmm, did a proper testing and it seems there's something missing when packaged for the mark-1:
Edit: Might be CUDA which makes this a bit tricky (maybe)... |
@forslund Definitely do not want to add more dependencies to core for every STT option. I actually added a package in my Google streaming STT PR to the Long term though I think if we can make packaging work it would be cool to included a local STT option like DeepSpeech that would work out of the box for users. Maybe not something for this PR though. Those errors are with SWIG which is provides Python bindings to C++ code. More likely some sort of mismatch of the package to the platform arch. |
@JPEWdev You are running this on desktop right? How is the setup working so far? Did you see the new 5.0 release? https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.0 |
It works really well on my laptop (the same one I used to do the performance analysis of the server). My laptop easily handles the stream in real time, and Mycroft doesn't appear to do anything CPU intense while recording an utterance, so it works out well. I didn't see the 0.5 release... It was only 21 hours ago ;) I'll try it first chance I get As far as packaging deepspeech with Mycroft.... I'm not sure the best route. I suspect that the target here would be deeply embedded systems without internet (I'm going to try on a raspberry pi here when I get a chance), or people who don't want to be bothered to set up a deepspeech server. In those cases, it would be nice if it could work "out of the box". I don't know the best way to facilitate that |
Also I should note that you still have to download the deepspeech model which is several GB, so it might be difficult to get a 100% "out of the box" experience anyway since I doubt you want to package that with Mycroft ;) |
So one possibility is to move the import of deepspeech and numpy into the The plugin thing I played with for tts can be seen here forslund@716ca26. And a plugin module would look like this: https://github.com/forslund/mycroft-tts-plugin-gtts |
Ya, I originally had that. I'll change it back |
I did a test implementation using the plugin approach in the feature/tts-stt-plugin branch which can be tested with a plugin based on the Deepspeech stt by @JPEWdev on desktop devices. Edit the config like the description above but the module is now called {
"stt": {
"module": "mycroft_stt_plugin_deepspeech",
"mycroft_stt_plugin_deepspeech": {
"model": "/home/jwatt/DeepSpeech/models/output_graph.pbmm",
"alphabet": "/home/jwatt/DeepSpeech/models/alphabet.txt",
"lm": "/home/jwatt/DeepSpeech/models/lm.binary",
"trie": "/home/jwatt/DeepSpeech/models/trie"
}
} |
Hi there, we're planning to add a plugin system for audio backends, STT and TTS services. This system will allow users (or developers configuring Mycroft for use in other projects) to pull in these services as they're needed rather than having them all included by default on every Mycroft installation. The aim is to keep mycroft-core lighter but still provide the same level of features and customization through the provision of these plugins. So I'm going to close this PR but only because it won't need to be included in mycroft-core. We'll be transitioning all of these out to be independent plugins tracked via #2701. Thanks |
Description
Adds a Speech to Text backend that directly interfaces with the
deepspeech streaming API.
How to test
Edit mycroft.conf to use a local deepspeech server (a DeepSpeech model will need to be downloaded). For example:
Using a memory-mapped model is highly recommended.
Contributor license agreement signed?
Yes