Add speech recognizer and synthesis on browser interface #113

sowu880 · 2023-04-13T16:12:54Z

Purpose

Enable speech input and output for browser interface.

Does this introduce a breaking change?

[ ] Yes
[x ] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Get the code

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

unmuntean · 2023-04-19T14:59:10Z

tried to implement this, i get blank screen with the error "Uncaught TypeError: HF is not a constructor
QuestionInput.tsx:16
" . The lines with the problem are :
const SpeechRecognition = (window as any).speechRecognition|| (window as any).webkitSpeechRecognition;
const recognition = new SpeechRecognition();

EDIT: Mozilla and other browsers usually don't support webkit Speech, had to overwrite default browser settings.

…azure-search-openai-demo into personal/sowu/addspeech

sowu880 · 2023-05-04T07:56:22Z

tried to implement this, i get blank screen with the error "Uncaught TypeError: HF is not a constructor QuestionInput.tsx:16 " . The lines with the problem are : const SpeechRecognition = (window as any).speechRecognition|| (window as any).webkitSpeechRecognition; const recognition = new SpeechRecognition();

EDIT: Mozilla and other browsers usually don't support webkit Speech, had to overwrite default browser settings.

Fix the bug: Add try catch for speech recognition constructor. Web Speech API is supported for the following browsers. Recognition can not be used on Mozilla and other browsers but will not throw exception.

sowu880 · 2023-05-26T05:15:39Z

Hi, could you help review the PR？ Thanks a lot.

vrajroutu · 2023-06-11T00:28:27Z

@sowu880

This code and integration of the speech is increasing the time of processing the request. A simple change should be let the text generate and display the result and let speech complete in the background without causing the further delay.

sowu880 · 2023-09-08T08:35:26Z

integration of the speech is increasing the time of processing the request

Updated. For now, text will display without waiting speech generation.

pamelafox · 2023-09-08T18:25:24Z

@sowu880 It seems that this PR doesn't include the creation of the speech resource. Can that be included as an optional resource in the Bicep files? Also, instead of using a key, can it used the ManagedIdentity credential? We are trying to avoid the use of API keys for security reasons.

pamelafox · 2024-05-23T21:31:51Z

@zedhaque I've made your suggested changes to split input/output and add a voice option. I also made named the input as INPUT_BROWSER and output as OUTPUT_AZURE as I could imagine us adding INPUT_AZURE or OUTPUT_BROWSER in the future.

zedhaque · 2024-05-24T00:16:51Z

@pamelafox - Thank you very much for incorporating my suggestions 👯
I will give a test run and revert back if any issues. Many thanks :)

john0isaac · 2024-05-26T20:12:22Z

@pamelafox I deployed a version that solely depends on the Web Speech API in the recognition and synthesis of the speech it's for free.
You can test it here:
https://dfbsfb-lh4hrrtgs4a42-appservice.azurewebsites.net/

This is the PR where I added it:
khelanmodi/build-24-langchain-vcore#47

It's based on the same changes you have here for the speech recognition part but depends on the same tool (Web Speech API) for speech synthesis instead of the Azure Speech API.

You might ask why is the synthesized voice bad.
This is the default en-us voice, it's called David and is available on most browsers.
You can use better voices from the list available here: https://mdn.github.io/dom-examples/web-speech-api/speak-easy-synthesis/
But each browser has its own set of available voices.
When you change the browser using this URL the list of available voices will change which is why I settled for the default one as it's available on most browsers but I think with some extra work this can be customized or even added as a drop-down to the developer settings.

sowu880 · 2024-05-27T05:10:55Z

@john0isaac @pamelafox @szhaomsft
Seems it's not a standard way to request Microsoft voice through Web Speech API. And it's not a full list of our voices.

The reason we use Azure Speech API because of the great voice quality and prosody with more than 100 locales. And our speech team have released many conversational voice recently. Try new voices. And many of our new voices can beat all competitors in the current marketing. That the reason why we highly recommend to use azure speech resource and we have a big team to support and maintain these product voice.

My suggestion is merge this "speak out" feature first, and then we can continuously upgrade it on other requirements.

…azure-search-openai-demo into personal/sowu/addspeech

pamelafox · 2024-05-28T17:09:21Z

app/frontend/src/pages/chat/Chat.tsx

        });
    };

-    const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], setAnswers: Function, responseBody: ReadableStream<any>) => {
+    const handleAsyncRequest = async (question: string, answers: [string, ChatAppResponse][], responseBody: ReadableStream<any>) => {


Removed unused setAnswers function from signature and call below

pamelafox · 2024-05-28T17:54:42Z

@john0isaac Thank you for sharing that, super helpful. I just tried it out and it even works in Edge on Mac (where the browser Speech Recognition does not work yet, sadly). I do agree with @sowu880 that the Azure voices are much more fluid, and I also selected a default for this PR that has the broadest language support possible, since developers use this repo across many languages.

So I think we should get this PR merged, and then could you send a PR to add a USE_SPEECH_OUTPUT_BROWSER option? That should be fairly compatible with the way I've modularized this PR, I think. Either the SpeechOutput component could take an additional answer prop and a enableBrowserOutput bool, or there could be a different SpeechOutputBrowser vs SpeechOutputAzure component.

I've asked @mattgotteiner to take a look at this PR now, since it's a large change and large changes can use multiple eyes.

john0isaac · 2024-05-28T18:12:08Z

@sowu880 the only advantage is that it's for free so, that's the value that you get from it and of course it won't be as good as using a paid service.
I do agree with you that using the Azure Speech API is better but just wanted to demonstrate other options to implement this.

@pamelafox sure I will create a PR once this is merged to add it as an optional low cost feature.

mattgotteiner · 2024-05-28T19:44:44Z

infra/main.bicep

@@ -48,6 +48,11 @@ param azureOpenAiApiVersion string = ''

 param openAiServiceName string = ''
 param openAiResourceGroupName string = ''
+
+param speechResourceGroupName string = ''
+param speechResourceGroupLocation string = location


Recommend adding AZURE_SPEEECH_LOCATION so that existing speech services can be used

infra/main.bicep

mattgotteiner · 2024-05-28T19:55:18Z

infra/main.parameters.json

+    "speechServiceName": {
+      "value": "${AZURE_SPEECH_SERVICE}"
+    },
+    "speechResourceGroupName": {


Recommend adding speech location here as a parameter

app/backend/app.py

pamelafox · 2024-05-28T21:23:34Z

I've added parameters that allow for overriding location, resource group, service name, and sku. Also renamed some parameters for greater consistency. All feedback has been addressed, I believe.

infra/core/ai/cognitiveservices.bicep

…-Samples#113) Co-authored-by: Sarah Widder <sawidder@microsoft.com>

sowu880 and others added 13 commits March 22, 2023 17:07

update

8eabc3c

edit website

26b1679

update

581b597

update

44a2c52

update

cb71b8f

merge

77b4792

update

f36dfa8

update

3e8c754

fix bug

23d874e

fix bug

e45fdb2

update

384993e

Merge branch 'main' into personal/sowu/addspeech

7a8d2a9

Update app.py

28d566c

sowu880 and others added 4 commits May 4, 2023 15:39

fix bug

13e2c33

Merge branch 'personal/sowu/addspeech' of https://github.com/sowu880/…

9bbcd1b

…azure-search-openai-demo into personal/sowu/addspeech

fix bug

a439b16

Merge branch 'main' into personal/sowu/addspeech

3bff5ba

szhaomsft requested review from pablocastro and jongio May 21, 2023 12:27

sowu880 added 5 commits September 6, 2023 13:51

merge

e449a42

update

a07b5d8

update

f0c063d

merge

73af9c5

update

861121f

pamelafox and others added 5 commits May 27, 2024 14:08

full test coverage

e2dbab4

Merge branch 'main' into personal/sowu/addspeech

bdcb964

More consistency between Chat/Ask

de3c175

Merge branch 'personal/sowu/addspeech' of https://github.com/sowu880/…

39eeb5f

…azure-search-openai-demo into personal/sowu/addspeech

Revert unneeded changes

581171c

pamelafox reviewed May 28, 2024

View reviewed changes

Merge branch 'main' into personal/sowu/addspeech

a6b2b23

mattgotteiner reviewed May 28, 2024

View reviewed changes

infra/main.bicep Show resolved Hide resolved

mattgotteiner reviewed May 28, 2024

View reviewed changes

app/backend/app.py Show resolved Hide resolved

mattgotteiner reviewed May 28, 2024

View reviewed changes

app/backend/app.py Show resolved Hide resolved

mattgotteiner reviewed May 28, 2024

View reviewed changes

app/backend/app.py Show resolved Hide resolved

pamelafox added 3 commits May 28, 2024 13:24

Add link to AAD token docs

fcf250f

Add more parameters to be able to reuse existing resources

2c5d259

Revert unneeded change

8028f22

Merge branch 'main' into personal/sowu/addspeech

cc3d62e

pamelafox requested a review from mattgotteiner May 29, 2024 13:09

github-advanced-security bot found potential problems May 29, 2024

View reviewed changes

infra/core/ai/cognitiveservices.bicep Dismissed Show dismissed Hide dismissed

mattgotteiner approved these changes May 29, 2024

View reviewed changes

Merge branch 'main' into personal/sowu/addspeech

8f85f47

pamelafox merged commit 7ffcb3b into Azure-Samples:main May 30, 2024
14 of 15 checks passed

ratkinsoncinz pushed a commit to cinzlab/azure-search-openai-demo that referenced this pull request Oct 6, 2024

Fix static files causing outdated code in studio deployed apps (Azure…

de20819

…-Samples#113) Co-authored-by: Sarah Widder <sawidder@microsoft.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add speech recognizer and synthesis on browser interface #113

Add speech recognizer and synthesis on browser interface #113

sowu880 commented Apr 13, 2023

unmuntean commented Apr 19, 2023 •

edited

Loading

sowu880 commented May 4, 2023

sowu880 commented May 26, 2023

vrajroutu commented Jun 11, 2023 •

edited

Loading

sowu880 commented Sep 8, 2023

pamelafox commented Sep 8, 2023

pamelafox commented May 23, 2024

zedhaque commented May 24, 2024

john0isaac commented May 26, 2024

sowu880 commented May 27, 2024 •

edited

Loading

pamelafox May 28, 2024

pamelafox commented May 28, 2024

john0isaac commented May 28, 2024

mattgotteiner May 28, 2024

mattgotteiner May 28, 2024

pamelafox commented May 28, 2024

Add speech recognizer and synthesis on browser interface #113

Add speech recognizer and synthesis on browser interface #113

Conversation

sowu880 commented Apr 13, 2023

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test

unmuntean commented Apr 19, 2023 • edited Loading

sowu880 commented May 4, 2023

sowu880 commented May 26, 2023

vrajroutu commented Jun 11, 2023 • edited Loading

sowu880 commented Sep 8, 2023

pamelafox commented Sep 8, 2023

pamelafox commented May 23, 2024

zedhaque commented May 24, 2024

john0isaac commented May 26, 2024

sowu880 commented May 27, 2024 • edited Loading

pamelafox May 28, 2024

Choose a reason for hiding this comment

pamelafox commented May 28, 2024

john0isaac commented May 28, 2024

mattgotteiner May 28, 2024

Choose a reason for hiding this comment

mattgotteiner May 28, 2024

Choose a reason for hiding this comment

pamelafox commented May 28, 2024

unmuntean commented Apr 19, 2023 •

edited

Loading

vrajroutu commented Jun 11, 2023 •

edited

Loading

sowu880 commented May 27, 2024 •

edited

Loading