Add local DeepSpeech STT #2161

JPEWdev · 2019-06-12T02:38:31Z

Description

Adds a Speech to Text backend that directly interfaces with the
deepspeech streaming API.

How to test

Edit mycroft.conf to use a local deepspeech server (a DeepSpeech model will need to be downloaded). For example:

{
  "stt": {
    "module": "deepspeech",
    "deepspeech": {
      "model": "/home/jwatt/DeepSpeech/models/output_graph.pbmm",
      "alphabet": "/home/jwatt/DeepSpeech/models/alphabet.txt",
      "lm": "/home/jwatt/DeepSpeech/models/lm.binary",
      "trie": "/home/jwatt/DeepSpeech/models/trie"
    }
}

Using a memory-mapped model is highly recommended.

Contributor license agreement signed?

Yes

pep8speaks · 2019-06-12T02:38:33Z

Hello @JPEWdev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-12 03:08:15 UTC

Adds a Speech to Text backend that directly interface with the deepspeech streaming API.

forslund · 2019-06-12T06:23:30Z

Hi @JPEWdev Thanks for this. However I'm of the opinion that the STT shouldn't be part of mycroft-core like this. I think it's better if a server or STT is setup outside of core and a stream or socket is used to send the data over. This helps with packaging (package size and build-time, possible conflicts between module versions, possibly inflating memory footprint despite not using the module) and keeps mycroft-core more of a "hub" software.

What is the main benefits with a direct implementation in core like this compared with a local deepspeech server which is streamed to?

This is definitely open for discussion, as stated above this is my opinion and @pendrods, @davidwagnerkc and @Ruthvicp may have other thoughts on the matter.

I can see a couple of different approaches as alternatives:

A skill that sets up a deepspeech server and changes the config to use the Streaming deepspeech STT, model and parameters could be selected using skill settings
A plugin system where a python module (from github or pypi) can be defined as a STT (I showed a simple version of this for TTS last year), it would functionally work like this one but the requirements wouldn't need to be included in the mycroft-core package but can easily be installed.

forslund · 2019-06-12T07:01:16Z

Update: The build time is only increased by 50% to ~30 Minutes so that's not too bad and the additional dependencies only increases the package size with only 10 MB so that's not too big of a deal. I however do think it's better to keep deepspeech separate from mycroft-core from an organizational point of view.

Again, this is only my opinion and this is no big issue practically since we can keep the models outside of the core package.

Edit: The package showed some warnings regarding to libdeepspeech.so so I'm going to do a verify of the install.

forslund · 2019-06-12T10:04:01Z

Hmm, did a proper testing and it seems there's something missing when packaged for the mark-1:

    import deepspeech
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/__init__.py", line 4, in <module>
    from deepspeech.impl import AudioToInputVector as audioToInputVector
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/impl.py", line 17, in <module>
    _impl = swig_import_helper()
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/impl.py", line 16, in swig_import_helper
    return importlib.import_module('_impl')
  File "/opt/venvs/mycroft-core/lib/python3.4/importlib/__init__.py", line 109, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: No module named '_impl'

Edit: Might be CUDA which makes this a bit tricky (maybe)...

davidwagnerkc · 2019-06-12T11:40:51Z

@forslund Definitely do not want to add more dependencies to core for every STT option. I actually added a package in my Google streaming STT PR to the requirements.txt, but was planning on removing it since it is just a handful of users who might set that up and they are going to be capable enough to install a package. I was thinking with some form of lazy loading the imports or even try/except, but your suggestions are good options as well.

Long term though I think if we can make packaging work it would be cool to included a local STT option like DeepSpeech that would work out of the box for users. Maybe not something for this PR though.

Those errors are with SWIG which is provides Python bindings to C++ code. More likely some sort of mismatch of the package to the platform arch.

davidwagnerkc · 2019-06-12T11:56:46Z

@JPEWdev You are running this on desktop right? How is the setup working so far?

Did you see the new 5.0 release? https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.0
I haven't play with a mmap model yet as you suggest, but look forward to playing with that and TFLite model.

JPEWdev · 2019-06-12T12:25:18Z

It works really well on my laptop (the same one I used to do the performance analysis of the server). My laptop easily handles the stream in real time, and Mycroft doesn't appear to do anything CPU intense while recording an utterance, so it works out well.

I didn't see the 0.5 release... It was only 21 hours ago ;) I'll try it first chance I get

As far as packaging deepspeech with Mycroft.... I'm not sure the best route. I suspect that the target here would be deeply embedded systems without internet (I'm going to try on a raspberry pi here when I get a chance), or people who don't want to be bothered to set up a deepspeech server. In those cases, it would be nice if it could work "out of the box". I don't know the best way to facilitate that

JPEWdev · 2019-06-12T12:34:47Z

Also I should note that you still have to download the deepspeech model which is several GB, so it might be difficult to get a 100% "out of the box" experience anyway since I doubt you want to package that with Mycroft ;)

forslund · 2019-06-12T12:37:37Z

So one possibility is to move the import of deepspeech and numpy into the __init__ of the DeepspeechSTT class. That way it would only be imported if it is selected and not raise exceptions on load. We could also add a deepspeech_setup.sh script to install the requirements and modify the config.

The plugin thing I played with for tts can be seen here forslund@716ca26. And a plugin module would look like this: https://github.com/forslund/mycroft-tts-plugin-gtts

JPEWdev · 2019-06-12T13:18:47Z

Ya, I originally had that. I'll change it back

forslund · 2019-06-13T13:30:16Z

I did a test implementation using the plugin approach in the feature/tts-stt-plugin branch which can be tested with a plugin based on the Deepspeech stt by @JPEWdev on desktop devices.
Install the plugin using pip into the mycroft venv:
./bin/mycroft-pip install git+https://github.com/forslund/mycroft-stt-plugin-deepspeech

Edit the config like the description above but the module is now called mycroft_stt_plugin_deepspeech

{
  "stt": {
    "module": "mycroft_stt_plugin_deepspeech",
    "mycroft_stt_plugin_deepspeech": {
      "model": "/home/jwatt/DeepSpeech/models/output_graph.pbmm",
      "alphabet": "/home/jwatt/DeepSpeech/models/alphabet.txt",
      "lm": "/home/jwatt/DeepSpeech/models/lm.binary",
      "trie": "/home/jwatt/DeepSpeech/models/trie"
    }
}

krisgesling · 2020-09-24T04:37:22Z

Hi there, we're planning to add a plugin system for audio backends, STT and TTS services.

This system will allow users (or developers configuring Mycroft for use in other projects) to pull in these services as they're needed rather than having them all included by default on every Mycroft installation. The aim is to keep mycroft-core lighter but still provide the same level of features and customization through the provision of these plugins.

So I'm going to close this PR but only because it won't need to be included in mycroft-core. We'll be transitioning all of these out to be independent plugins tracked via #2701.

Thanks

Add local DeepSpeech STT

4e85e3b

Adds a Speech to Text backend that directly interface with the deepspeech streaming API.

JPEWdev force-pushed the feature/local-deepspeech branch from c722b3f to 4e85e3b Compare June 12, 2019 03:08

devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Jun 12, 2019

krisgesling closed this Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local DeepSpeech STT #2161

Add local DeepSpeech STT #2161

JPEWdev commented Jun 12, 2019

pep8speaks commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

davidwagnerkc commented Jun 12, 2019

davidwagnerkc commented Jun 12, 2019

JPEWdev commented Jun 12, 2019

JPEWdev commented Jun 12, 2019

forslund commented Jun 12, 2019 •

edited

Loading

JPEWdev commented Jun 12, 2019

forslund commented Jun 13, 2019

krisgesling commented Sep 24, 2020

Add local DeepSpeech STT #2161

Add local DeepSpeech STT #2161

Conversation

JPEWdev commented Jun 12, 2019

Description

How to test

Contributor license agreement signed?

pep8speaks commented Jun 12, 2019 • edited Loading

Comment last updated at 2019-06-12 03:08:15 UTC

forslund commented Jun 12, 2019 • edited Loading

forslund commented Jun 12, 2019 • edited Loading

forslund commented Jun 12, 2019 • edited Loading

davidwagnerkc commented Jun 12, 2019

davidwagnerkc commented Jun 12, 2019

JPEWdev commented Jun 12, 2019

JPEWdev commented Jun 12, 2019

forslund commented Jun 12, 2019 • edited Loading

JPEWdev commented Jun 12, 2019

forslund commented Jun 13, 2019

krisgesling commented Sep 24, 2020

pep8speaks commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading

forslund commented Jun 12, 2019 •

edited

Loading