-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speech recognizer and synthesis on browser interface #113
Add speech recognizer and synthesis on browser interface #113
Conversation
tried to implement this, i get blank screen with the error "Uncaught TypeError: HF is not a constructor EDIT: Mozilla and other browsers usually don't support webkit Speech, had to overwrite default browser settings. |
Hi, could you help review the PR? Thanks a lot. |
Updated. For now, text will display without waiting speech generation. |
@sowu880 It seems that this PR doesn't include the creation of the speech resource. Can that be included as an optional resource in the Bicep files? Also, instead of using a key, can it used the ManagedIdentity credential? We are trying to avoid the use of API keys for security reasons. |
@zedhaque I've made your suggested changes to split input/output and add a voice option. I also made named the input as INPUT_BROWSER and output as OUTPUT_AZURE as I could imagine us adding INPUT_AZURE or OUTPUT_BROWSER in the future. |
@pamelafox - Thank you very much for incorporating my suggestions 👯 |
@pamelafox I deployed a version that solely depends on the Web Speech API in the recognition and synthesis of the speech it's for free. This is the PR where I added it: It's based on the same changes you have here for the speech recognition part but depends on the same tool (Web Speech API) for speech synthesis instead of the Azure Speech API. You might ask why is the synthesized voice bad. |
@john0isaac @pamelafox @szhaomsft The reason we use Azure Speech API because of the great voice quality and prosody with more than 100 locales. And our speech team have released many conversational voice recently. Try new voices. And many of our new voices can beat all competitors in the current marketing. That the reason why we highly recommend to use azure speech resource and we have a big team to support and maintain these product voice. My suggestion is merge this "speak out" feature first, and then we can continuously upgrade it on other requirements. |
}); | ||
}; | ||
|
||
const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], setAnswers: Function, responseBody: ReadableStream<any>) => { | ||
const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], responseBody: ReadableStream<any>) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unused setAnswers function from signature and call below
@john0isaac Thank you for sharing that, super helpful. I just tried it out and it even works in Edge on Mac (where the browser Speech Recognition does not work yet, sadly). I do agree with @sowu880 that the Azure voices are much more fluid, and I also selected a default for this PR that has the broadest language support possible, since developers use this repo across many languages. So I think we should get this PR merged, and then could you send a PR to add a USE_SPEECH_OUTPUT_BROWSER option? That should be fairly compatible with the way I've modularized this PR, I think. Either the SpeechOutput component could take an additional I've asked @mattgotteiner to take a look at this PR now, since it's a large change and large changes can use multiple eyes. |
@sowu880 the only advantage is that it's for free so, that's the value that you get from it and of course it won't be as good as using a paid service. @pamelafox sure I will create a PR once this is merged to add it as an optional low cost feature. |
infra/main.bicep
Outdated
@@ -48,6 +48,11 @@ param azureOpenAiApiVersion string = '' | |||
|
|||
param openAiServiceName string = '' | |||
param openAiResourceGroupName string = '' | |||
|
|||
param speechResourceGroupName string = '' | |||
param speechResourceGroupLocation string = location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend adding AZURE_SPEEECH_LOCATION so that existing speech services can be used
infra/main.parameters.json
Outdated
"speechServiceName": { | ||
"value": "${AZURE_SPEECH_SERVICE}" | ||
}, | ||
"speechResourceGroupName": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend adding speech location here as a parameter
I've added parameters that allow for overriding location, resource group, service name, and sku. Also renamed some parameters for greater consistency. All feedback has been addressed, I believe. |
…-Samples#113) Co-authored-by: Sarah Widder <sawidder@microsoft.com>
Purpose
Enable speech input and output for browser interface.
Does this introduce a breaking change?
Pull Request Type
What kind of change does this Pull Request introduce?
How to Test