Skip to content
This repository has been archived by the owner on Sep 8, 2024. It is now read-only.

Add local DeepSpeech STT #2161

Closed
wants to merge 1 commit into from

Conversation

JPEWdev
Copy link
Contributor

@JPEWdev JPEWdev commented Jun 12, 2019

Description

Adds a Speech to Text backend that directly interfaces with the
deepspeech streaming API.

How to test

Edit mycroft.conf to use a local deepspeech server (a DeepSpeech model will need to be downloaded). For example:

{
  "stt": {
    "module": "deepspeech",
    "deepspeech": {
      "model": "/home/jwatt/DeepSpeech/models/output_graph.pbmm",
      "alphabet": "/home/jwatt/DeepSpeech/models/alphabet.txt",
      "lm": "/home/jwatt/DeepSpeech/models/lm.binary",
      "trie": "/home/jwatt/DeepSpeech/models/trie"
    }
}

Using a memory-mapped model is highly recommended.

Contributor license agreement signed?

Yes

@pep8speaks
Copy link

pep8speaks commented Jun 12, 2019

Hello @JPEWdev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-12 03:08:15 UTC

Adds a Speech to Text backend that directly interface with the
deepspeech streaming API.
@forslund
Copy link
Collaborator

forslund commented Jun 12, 2019

Hi @JPEWdev Thanks for this. However I'm of the opinion that the STT shouldn't be part of mycroft-core like this. I think it's better if a server or STT is setup outside of core and a stream or socket is used to send the data over. This helps with packaging (package size and build-time, possible conflicts between module versions, possibly inflating memory footprint despite not using the module) and keeps mycroft-core more of a "hub" software.

What is the main benefits with a direct implementation in core like this compared with a local deepspeech server which is streamed to?

This is definitely open for discussion, as stated above this is my opinion and @pendrods, @davidwagnerkc and @Ruthvicp may have other thoughts on the matter.

I can see a couple of different approaches as alternatives:

  • A skill that sets up a deepspeech server and changes the config to use the Streaming deepspeech STT, model and parameters could be selected using skill settings

  • A plugin system where a python module (from github or pypi) can be defined as a STT (I showed a simple version of this for TTS last year), it would functionally work like this one but the requirements wouldn't need to be included in the mycroft-core package but can easily be installed.

@forslund
Copy link
Collaborator

forslund commented Jun 12, 2019

Update: The build time is only increased by 50% to ~30 Minutes so that's not too bad and the additional dependencies only increases the package size with only 10 MB so that's not too big of a deal. I however do think it's better to keep deepspeech separate from mycroft-core from an organizational point of view.

Again, this is only my opinion and this is no big issue practically since we can keep the models outside of the core package.

Edit: The package showed some warnings regarding to libdeepspeech.so so I'm going to do a verify of the install.

@forslund
Copy link
Collaborator

forslund commented Jun 12, 2019

Hmm, did a proper testing and it seems there's something missing when packaged for the mark-1:

    import deepspeech
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/__init__.py", line 4, in <module>
    from deepspeech.impl import AudioToInputVector as audioToInputVector
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/impl.py", line 17, in <module>
    _impl = swig_import_helper()
  File "/opt/venvs/mycroft-core/lib/python3.4/site-packages/deepspeech/impl.py", line 16, in swig_import_helper
    return importlib.import_module('_impl')
  File "/opt/venvs/mycroft-core/lib/python3.4/importlib/__init__.py", line 109, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: No module named '_impl'

Edit: Might be CUDA which makes this a bit tricky (maybe)...

@devs-mycroft devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Jun 12, 2019
@davidwagnerkc
Copy link
Contributor

@forslund Definitely do not want to add more dependencies to core for every STT option. I actually added a package in my Google streaming STT PR to the requirements.txt, but was planning on removing it since it is just a handful of users who might set that up and they are going to be capable enough to install a package. I was thinking with some form of lazy loading the imports or even try/except, but your suggestions are good options as well.

Long term though I think if we can make packaging work it would be cool to included a local STT option like DeepSpeech that would work out of the box for users. Maybe not something for this PR though.

Those errors are with SWIG which is provides Python bindings to C++ code. More likely some sort of mismatch of the package to the platform arch.

@davidwagnerkc
Copy link
Contributor

@JPEWdev You are running this on desktop right? How is the setup working so far?

Did you see the new 5.0 release? https://github.com/mozilla/DeepSpeech/releases/tag/v0.5.0
I haven't play with a mmap model yet as you suggest, but look forward to playing with that and TFLite model.

@JPEWdev
Copy link
Contributor Author

JPEWdev commented Jun 12, 2019

It works really well on my laptop (the same one I used to do the performance analysis of the server). My laptop easily handles the stream in real time, and Mycroft doesn't appear to do anything CPU intense while recording an utterance, so it works out well.

I didn't see the 0.5 release... It was only 21 hours ago ;) I'll try it first chance I get

As far as packaging deepspeech with Mycroft.... I'm not sure the best route. I suspect that the target here would be deeply embedded systems without internet (I'm going to try on a raspberry pi here when I get a chance), or people who don't want to be bothered to set up a deepspeech server. In those cases, it would be nice if it could work "out of the box". I don't know the best way to facilitate that

@JPEWdev
Copy link
Contributor Author

JPEWdev commented Jun 12, 2019

Also I should note that you still have to download the deepspeech model which is several GB, so it might be difficult to get a 100% "out of the box" experience anyway since I doubt you want to package that with Mycroft ;)

@forslund
Copy link
Collaborator

forslund commented Jun 12, 2019

So one possibility is to move the import of deepspeech and numpy into the __init__ of the DeepspeechSTT class. That way it would only be imported if it is selected and not raise exceptions on load. We could also add a deepspeech_setup.sh script to install the requirements and modify the config.

The plugin thing I played with for tts can be seen here forslund@716ca26. And a plugin module would look like this: https://github.com/forslund/mycroft-tts-plugin-gtts

@JPEWdev
Copy link
Contributor Author

JPEWdev commented Jun 12, 2019

Ya, I originally had that. I'll change it back

@forslund
Copy link
Collaborator

I did a test implementation using the plugin approach in the feature/tts-stt-plugin branch which can be tested with a plugin based on the Deepspeech stt by @JPEWdev on desktop devices.
Install the plugin using pip into the mycroft venv:
./bin/mycroft-pip install git+https://github.com/forslund/mycroft-stt-plugin-deepspeech

Edit the config like the description above but the module is now called mycroft_stt_plugin_deepspeech

{
  "stt": {
    "module": "mycroft_stt_plugin_deepspeech",
    "mycroft_stt_plugin_deepspeech": {
      "model": "/home/jwatt/DeepSpeech/models/output_graph.pbmm",
      "alphabet": "/home/jwatt/DeepSpeech/models/alphabet.txt",
      "lm": "/home/jwatt/DeepSpeech/models/lm.binary",
      "trie": "/home/jwatt/DeepSpeech/models/trie"
    }
}

@krisgesling
Copy link
Contributor

Hi there, we're planning to add a plugin system for audio backends, STT and TTS services.

This system will allow users (or developers configuring Mycroft for use in other projects) to pull in these services as they're needed rather than having them all included by default on every Mycroft installation. The aim is to keep mycroft-core lighter but still provide the same level of features and customization through the provision of these plugins.

So I'm going to close this PR but only because it won't need to be included in mycroft-core. We'll be transitioning all of these out to be independent plugins tracked via #2701.

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants