Skip to content

More and better options for text-to-speech voices #617

@klues

Description

@klues

Recently our ResponsiveVoice integration stopped working because we're above their "fair use" limit. It never was very reliable, so no problem to get rid of it.

However, for some cooperations we need more reliable options for text-to-speech (TTS), especially for Linux clients (e.g. hospital terminals), where there are no offline system voices available.

Since there are many possible providers (online services like MS Azure, Google, Polly etc.) and maybe also new forms of in-browser offline voices (e.g. test piper.ttstool.com - I think we need some adaptions to the UI for selecting voices. Probably it shouldn't be a single dropdown (with potentially hundreds of voices) anymore, but some 2-step selection.

UI proposal:
Image

These would be possibilities for voice providers:

  • System voices: default - like now - voices coming from WebSpeech API (voices installed on the system)
  • online services like MS Azure, Amazon Polly etc.
  • voices coming from the external speech bridge I've once implemented (which will somehow be replaced by in-Asterics implementations then, but for special cases maybe still valuable)

For services which need configuraiton (like API keys), there will be the Configure button next to the select, which opens a modal.

@arasaac-dga @ms-mialingvo @willwade @ChrisVeigl - you're welcome to give your feedback on this proposal.
@willwade I'm wondering if I should use your js-tts-wrapper for implementation in the background - probably yes - but what do you think about production-readiness of it?

related: #181

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions