Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech streaming_recognize() surface could be improved? #2709

Closed
daspecster opened this issue Nov 8, 2016 · 2 comments
Closed

Speech streaming_recognize() surface could be improved? #2709

daspecster opened this issue Nov 8, 2016 · 2 comments
Assignees
Labels
api: speech Issues related to the Speech-to-Text API. priority: p2 Moderately-important priority. Fix may not be included in next release.

Comments

@daspecster
Copy link
Contributor

After merging #2680, there is still some discussion about the surface that we expose.

@daspecster daspecster added the api: speech Issues related to the Speech-to-Text API. label Nov 8, 2016
@daspecster
Copy link
Contributor Author

@dhermes, to continue the conversation we were having on chat.

What should the goals be for this?
I would love make a shortcut to access the transcript data more easily but I think that some the streaming interim results make that difficult. The other data that's contextual to the responses should be easy to access as well.

Here's my proposal for a strategy.

  1. Make sync and async recognize return results with the transcripts available directly from the result, since there should be only one result (result.transcript, result.confidence). Then alternatives can be on result.alternatives[0].transcript etc.
  2. Streaming recognize returns many different kinds of responses (see example below). I think we could wrap these responses and just yield them to the consumer. Then you could have things like result.audio_started, result.speech_started(this implies tracking some state though). Then you could have something similar to sync and async where you have the top transcript available directly on the result but other data could be there as well but it could be deeper.
  3. If we added back that result container object that I setup, then we could offer the consumer a method they could poll to get the best over all guess for the response so far. Which in the example below, would be appending the top result for each result_index.

Example at this point

results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 } result_index: 1

you could call results.get_best_text() = 'to be or not to be that is.

Example of accessing data for

results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 } result_index: 1
>>> print(result.transcript)
 that is
>>> print(result.alternatives[0].transcript)
 the question

The docs example of streaming_recognize set of responses.

endpointer_type: START_OF_SPEECH
results { alternatives { transcript: "tube" } stability: 0.01 } result_index: 0
results { alternatives { transcript: "to be a" } stability: 0.01 } result_index: 0
results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 } result_index: 0
results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true } result_index: 0
results { alternatives { transcript: " that's" } stability: 0.01 } result_index: 1
results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 } result_index: 1
endpointer_type: END_OF_SPEECH
results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true } result_index: 1
endpointer_type: END_OF_AUDIO

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer
Copy link
Contributor

Since this ticket was filed, we have released a new Speech library which entails effectively a complete rewrite. Since it is very likely to solve this issue, I am closing this. However, please feel free to reopen if my closure is premature.

Note: Make sure you are using google.cloud.speech.SpeechClient, as the old Client still exists as a deprecated artifact for a little longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: speech Issues related to the Speech-to-Text API. priority: p2 Moderately-important priority. Fix may not be included in next release.
Projects
None yet
Development

No branches or pull requests

3 participants