Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speech recognizer and synthesis on browser interface #113

Merged
merged 51 commits into from
May 30, 2024

Conversation

sowu880
Copy link
Contributor

@sowu880 sowu880 commented Apr 13, 2023

Purpose

Enable speech input and output for browser interface.

Does this introduce a breaking change?

[ ] Yes
[x ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

  • Get the code
git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

@unmuntean
Copy link

unmuntean commented Apr 19, 2023

tried to implement this, i get blank screen with the error "Uncaught TypeError: HF is not a constructor
QuestionInput.tsx:16
" . The lines with the problem are :
const SpeechRecognition = (window as any).speechRecognition|| (window as any).webkitSpeechRecognition;
const recognition = new SpeechRecognition();

EDIT: Mozilla and other browsers usually don't support webkit Speech, had to overwrite default browser settings.

@sowu880
Copy link
Contributor Author

sowu880 commented May 4, 2023

tried to implement this, i get blank screen with the error "Uncaught TypeError: HF is not a constructor QuestionInput.tsx:16 " . The lines with the problem are : const SpeechRecognition = (window as any).speechRecognition|| (window as any).webkitSpeechRecognition; const recognition = new SpeechRecognition();

EDIT: Mozilla and other browsers usually don't support webkit Speech, had to overwrite default browser settings.

Fix the bug: Add try catch for speech recognition constructor. Web Speech API is supported for the following browsers. Recognition can not be used on Mozilla and other browsers but will not throw exception.
image

@sowu880
Copy link
Contributor Author

sowu880 commented May 26, 2023

Hi, could you help review the PR? Thanks a lot.

@vrajroutu
Copy link

vrajroutu commented Jun 11, 2023

@sowu880

This code and integration of the speech is increasing the time of processing the request. A simple change should be let the text generate and display the result and let speech complete in the background without causing the further delay.

Screenshot 2023-06-10 at 8 30 18 PM

@sowu880
Copy link
Contributor Author

sowu880 commented Sep 8, 2023

integration of the speech is increasing the time of processing the request

Updated. For now, text will display without waiting speech generation.

@pamelafox
Copy link
Collaborator

@sowu880 It seems that this PR doesn't include the creation of the speech resource. Can that be included as an optional resource in the Bicep files? Also, instead of using a key, can it used the ManagedIdentity credential? We are trying to avoid the use of API keys for security reasons.

@pamelafox
Copy link
Collaborator

@zedhaque I've made your suggested changes to split input/output and add a voice option. I also made named the input as INPUT_BROWSER and output as OUTPUT_AZURE as I could imagine us adding INPUT_AZURE or OUTPUT_BROWSER in the future.

@zedhaque
Copy link
Contributor

@pamelafox - Thank you very much for incorporating my suggestions 👯
I will give a test run and revert back if any issues. Many thanks :)

@john0isaac
Copy link
Contributor

@pamelafox I deployed a version that solely depends on the Web Speech API in the recognition and synthesis of the speech it's for free.
You can test it here:
https://dfbsfb-lh4hrrtgs4a42-appservice.azurewebsites.net/

This is the PR where I added it:
khelanmodi/build-24-langchain-vcore#47

It's based on the same changes you have here for the speech recognition part but depends on the same tool (Web Speech API) for speech synthesis instead of the Azure Speech API.

You might ask why is the synthesized voice bad.
This is the default en-us voice, it's called David and is available on most browsers.
You can use better voices from the list available here: https://mdn.github.io/dom-examples/web-speech-api/speak-easy-synthesis/
But each browser has its own set of available voices.
When you change the browser using this URL the list of available voices will change which is why I settled for the default one as it's available on most browsers but I think with some extra work this can be customized or even added as a drop-down to the developer settings.

@sowu880
Copy link
Contributor Author

sowu880 commented May 27, 2024

@john0isaac @pamelafox @szhaomsft
Seems it's not a standard way to request Microsoft voice through Web Speech API. And it's not a full list of our voices.

The reason we use Azure Speech API because of the great voice quality and prosody with more than 100 locales. And our speech team have released many conversational voice recently. Try new voices. And many of our new voices can beat all competitors in the current marketing. That the reason why we highly recommend to use azure speech resource and we have a big team to support and maintain these product voice.

My suggestion is merge this "speak out" feature first, and then we can continuously upgrade it on other requirements.

});
};

const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], setAnswers: Function, responseBody: ReadableStream<any>) => {
const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], responseBody: ReadableStream<any>) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused setAnswers function from signature and call below

@pamelafox
Copy link
Collaborator

@john0isaac Thank you for sharing that, super helpful. I just tried it out and it even works in Edge on Mac (where the browser Speech Recognition does not work yet, sadly). I do agree with @sowu880 that the Azure voices are much more fluid, and I also selected a default for this PR that has the broadest language support possible, since developers use this repo across many languages.

So I think we should get this PR merged, and then could you send a PR to add a USE_SPEECH_OUTPUT_BROWSER option? That should be fairly compatible with the way I've modularized this PR, I think. Either the SpeechOutput component could take an additional answer prop and a enableBrowserOutput bool, or there could be a different SpeechOutputBrowser vs SpeechOutputAzure component.

I've asked @mattgotteiner to take a look at this PR now, since it's a large change and large changes can use multiple eyes.

@john0isaac
Copy link
Contributor

@sowu880 the only advantage is that it's for free so, that's the value that you get from it and of course it won't be as good as using a paid service.
I do agree with you that using the Azure Speech API is better but just wanted to demonstrate other options to implement this.

@pamelafox sure I will create a PR once this is merged to add it as an optional low cost feature.

infra/main.bicep Outdated
@@ -48,6 +48,11 @@ param azureOpenAiApiVersion string = ''

param openAiServiceName string = ''
param openAiResourceGroupName string = ''

param speechResourceGroupName string = ''
param speechResourceGroupLocation string = location
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend adding AZURE_SPEEECH_LOCATION so that existing speech services can be used

"speechServiceName": {
"value": "${AZURE_SPEECH_SERVICE}"
},
"speechResourceGroupName": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend adding speech location here as a parameter

@pamelafox
Copy link
Collaborator

I've added parameters that allow for overriding location, resource group, service name, and sku. Also renamed some parameters for greater consistency. All feedback has been addressed, I believe.

infra/core/ai/cognitiveservices.bicep Dismissed Show dismissed Hide dismissed
@pamelafox pamelafox merged commit 7ffcb3b into Azure-Samples:main May 30, 2024
14 of 15 checks passed
ratkinsoncinz pushed a commit to cinzlab/azure-search-openai-demo that referenced this pull request Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants