Skip to content

Commit fb10016

Browse files
authored
Merge pull request #2495 from daspecster/cleanup-speech
Updates from #2344 for speech API.
2 parents 8717bf2 + 04bf28d commit fb10016

File tree

8 files changed

+361
-175
lines changed

8 files changed

+361
-175
lines changed

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,7 @@
176176
speech-encoding
177177
speech-metadata
178178
speech-operation
179+
speech-sample
179180
speech-transcript
180181

181182
.. toctree::

docs/speech-sample.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Speech Sample
2+
=============
3+
4+
.. automodule:: google.cloud.speech.sample
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

docs/speech-usage.rst

Lines changed: 83 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ base.
77

88
.. warning::
99

10-
This is a Beta release of Google Speech API. This
11-
API is not intended for real-time usage in critical applications.
10+
This is a Beta release of Google Speech API. This
11+
API is not intended for real-time usage in critical applications.
1212

1313
.. _Google Speech: https://cloud.google.com/speech/docs/getting-started
1414

@@ -25,10 +25,10 @@ For an overview of authentication in ``google-cloud-python``, see
2525
Assuming your environment is set up as described in that document,
2626
create an instance of :class:`~google.cloud.speech.client.Client`.
2727

28-
.. code-block:: python
28+
.. code-block:: python
2929
30-
>>> from google.cloud import speech
31-
>>> client = speech.Client()
30+
>>> from google.cloud import speech
31+
>>> client = speech.Client()
3232
3333
3434
Asychronous Recognition
@@ -42,23 +42,27 @@ audio data of any duration up to 80 minutes.
4242
See: `Speech Asynchronous Recognize`_
4343

4444

45-
.. code-block:: python
46-
47-
>>> import time
48-
>>> operation = client.async_recognize(
49-
... None, 'gs://my-bucket/recording.flac',
50-
... 'FLAC', 16000, max_alternatives=2)
51-
>>> retry_count = 100
52-
>>> while retry_count > 0 and not operation.complete:
53-
... retry_count -= 1
54-
... time.sleep(10)
55-
... operation.poll() # API call
56-
>>> operation.complete
57-
True
58-
>>> operation.results[0].transcript
59-
'how old is the Brooklyn Bridge'
60-
>>> operation.results[0].confidence
61-
0.98267895
45+
.. code-block:: python
46+
47+
>>> import time
48+
>>> from google.cloud import speech
49+
>>> from google.cloud.speech.encoding import Encoding
50+
>>> client = speech.Client()
51+
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
52+
... encoding=Encoding.FLAC,
53+
... sample_rate=44100)
54+
>>> operation = client.async_recognize(sample, max_alternatives=2)
55+
>>> retry_count = 100
56+
>>> while retry_count > 0 and not operation.complete:
57+
... retry_count -= 1
58+
... time.sleep(10)
59+
... operation.poll() # API call
60+
>>> operation.complete
61+
True
62+
>>> operation.results[0].transcript
63+
'how old is the Brooklyn Bridge'
64+
>>> operation.results[0].confidence
65+
0.98267895
6266
6367
6468
Synchronous Recognition
@@ -67,11 +71,21 @@ Synchronous Recognition
6771
The :meth:`~google.cloud.speech.Client.sync_recognize` method converts speech
6872
data to text and returns alternative text transcriptons.
6973

70-
.. code-block:: python
74+
This example uses ``language_code='en-GB'`` to better recognize a dialect from
75+
Great Britian.
76+
77+
.. code-block:: python
7178
79+
>>> from google.cloud import speech
80+
>>> from google.cloud.speech.encoding import Encoding
81+
>>> client = speech.Client()
82+
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
83+
... encoding=Encoding.FLAC,
84+
... sample_rate=44100)
85+
>>> operation = client.async_recognize(sample, max_alternatives=2)
7286
>>> alternatives = client.sync_recognize(
73-
... None, 'gs://my-bucket/recording.flac',
74-
... 'FLAC', 16000, max_alternatives=2)
87+
... 'FLAC', 16000, source_uri='gs://my-bucket/recording.flac',
88+
... language_code='en-GB', max_alternatives=2)
7589
>>> for alternative in alternatives:
7690
... print('=' * 20)
7791
... print('transcript: ' + alternative['transcript'])
@@ -83,5 +97,49 @@ data to text and returns alternative text transcriptons.
8397
transcript: Hello, this is one test
8498
confidence: 0
8599
100+
Example of using the profanity filter.
101+
102+
.. code-block:: python
103+
104+
>>> from google.cloud import speech
105+
>>> from google.cloud.speech.encoding import Encoding
106+
>>> client = speech.Client()
107+
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
108+
... encoding=Encoding.FLAC,
109+
... sample_rate=44100)
110+
>>> alternatives = client.sync_recognize(sample, max_alternatives=1,
111+
... profanity_filter=True)
112+
>>> for alternative in alternatives:
113+
... print('=' * 20)
114+
... print('transcript: ' + alternative['transcript'])
115+
... print('confidence: ' + alternative['confidence'])
116+
====================
117+
transcript: Hello, this is a f****** test
118+
confidence: 0.81
119+
120+
Using speech context hints to get better results. This can be used to improve
121+
the accuracy for specific words and phrases. This can also be used to add new
122+
words to the vocabulary of the recognizer.
123+
124+
.. code-block:: python
125+
126+
>>> from google.cloud import speech
127+
>>> from google.cloud.speech.encoding import Encoding
128+
>>> client = speech.Client()
129+
>>> sample = client.sample(source_uri='gs://my-bucket/recording.flac',
130+
... encoding=Encoding.FLAC,
131+
... sample_rate=44100)
132+
>>> hints = ['hi', 'good afternoon']
133+
>>> alternatives = client.sync_recognize(sample, max_alternatives=2,
134+
... speech_context=hints)
135+
>>> for alternative in alternatives:
136+
... print('=' * 20)
137+
... print('transcript: ' + alternative['transcript'])
138+
... print('confidence: ' + alternative['confidence'])
139+
====================
140+
transcript: Hello, this is a test
141+
confidence: 0.81
142+
143+
86144
.. _sync_recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/syncrecognize
87145
.. _Speech Asynchronous Recognize: https://cloud.google.com/speech/reference/rest/v1beta1/speech/asyncrecognize

speech/google/cloud/speech/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,4 @@
1515
"""Google Cloud Speech API wrapper."""
1616

1717
from google.cloud.speech.client import Client
18-
from google.cloud.speech.client import Encoding
1918
from google.cloud.speech.connection import Connection

speech/google/cloud/speech/client.py

Lines changed: 44 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@
1919
from google.cloud._helpers import _to_bytes
2020
from google.cloud import client as client_module
2121
from google.cloud.speech.connection import Connection
22-
from google.cloud.speech.encoding import Encoding
2322
from google.cloud.speech.operation import Operation
23+
from google.cloud.speech.sample import Sample
2424

2525

2626
class Client(client_module.Client):
@@ -46,39 +46,18 @@ class Client(client_module.Client):
4646

4747
_connection_class = Connection
4848

49-
def async_recognize(self, content, source_uri, encoding, sample_rate,
50-
language_code=None, max_alternatives=None,
51-
profanity_filter=None, speech_context=None):
49+
def async_recognize(self, sample, language_code=None,
50+
max_alternatives=None, profanity_filter=None,
51+
speech_context=None):
5252
"""Asychronous Recognize request to Google Speech API.
5353
5454
.. _async_recognize: https://cloud.google.com/speech/reference/\
5555
rest/v1beta1/speech/asyncrecognize
5656
5757
See `async_recognize`_.
5858
59-
:type content: bytes
60-
:param content: Byte stream of audio.
61-
62-
:type source_uri: str
63-
:param source_uri: URI that points to a file that contains audio
64-
data bytes as specified in RecognitionConfig.
65-
Currently, only Google Cloud Storage URIs are
66-
supported, which must be specified in the following
67-
format: ``gs://bucket_name/object_name``.
68-
69-
:type encoding: str
70-
:param encoding: encoding of audio data sent in all RecognitionAudio
71-
messages, can be one of: :attr:`~.Encoding.LINEAR16`,
72-
:attr:`~.Encoding.FLAC`, :attr:`~.Encoding.MULAW`,
73-
:attr:`~.Encoding.AMR`, :attr:`~.Encoding.AMR_WB`
74-
75-
:type sample_rate: int
76-
:param sample_rate: Sample rate in Hertz of the audio data sent in all
77-
requests. Valid values are: 8000-48000. For best
78-
results, set the sampling rate of the audio source
79-
to 16000 Hz. If that's not possible, use the
80-
native sample rate of the audio source (instead of
81-
re-sampling).
59+
:type sample: :class:`~google.cloud.speech.sample.Sample`
60+
:param sample: Instance of ``Sample`` containing audio information.
8261
8362
:type language_code: str
8463
:param language_code: (Optional) The language of the supplied audio as
@@ -111,32 +90,25 @@ def async_recognize(self, content, source_uri, encoding, sample_rate,
11190
:returns: ``Operation`` for asynchronous request to Google Speech API.
11291
"""
11392

114-
data = _build_request_data(content, source_uri, encoding,
115-
sample_rate, language_code,
116-
max_alternatives, profanity_filter,
117-
speech_context)
93+
data = _build_request_data(sample, language_code, max_alternatives,
94+
profanity_filter, speech_context)
11895

11996
api_response = self.connection.api_request(
12097
method='POST', path='speech:asyncrecognize', data=data)
12198

12299
return Operation.from_api_repr(self, api_response)
123100

124-
def sync_recognize(self, content, source_uri, encoding, sample_rate,
125-
language_code=None, max_alternatives=None,
126-
profanity_filter=None, speech_context=None):
127-
"""Synchronous Speech Recognition.
128-
129-
.. _sync_recognize: https://cloud.google.com/speech/reference/\
130-
rest/v1beta1/speech/syncrecognize
131-
132-
See `sync_recognize`_.
101+
@staticmethod
102+
def sample(content=None, source_uri=None, encoding=None,
103+
sample_rate=None):
104+
"""Factory: construct Sample to use when making recognize requests.
133105
134106
:type content: bytes
135-
:param content: Byte stream of audio.
107+
:param content: (Optional) Byte stream of audio.
136108
137109
:type source_uri: str
138-
:param source_uri: URI that points to a file that contains audio
139-
data bytes as specified in RecognitionConfig.
110+
:param source_uri: (Optional) URI that points to a file that contains
111+
audio data bytes as specified in RecognitionConfig.
140112
Currently, only Google Cloud Storage URIs are
141113
supported, which must be specified in the following
142114
format: ``gs://bucket_name/object_name``.
@@ -155,6 +127,25 @@ def sync_recognize(self, content, source_uri, encoding, sample_rate,
155127
native sample rate of the audio source (instead of
156128
re-sampling).
157129
130+
:rtype: :class:`~google.cloud.speech.sample.Sample`
131+
:returns: Instance of ``Sample``.
132+
"""
133+
return Sample(content=content, source_uri=source_uri,
134+
encoding=encoding, sample_rate=sample_rate)
135+
136+
def sync_recognize(self, sample, language_code=None,
137+
max_alternatives=None, profanity_filter=None,
138+
speech_context=None):
139+
"""Synchronous Speech Recognition.
140+
141+
.. _sync_recognize: https://cloud.google.com/speech/reference/\
142+
rest/v1beta1/speech/syncrecognize
143+
144+
See `sync_recognize`_.
145+
146+
:type sample: :class:`~google.cloud.speech.sample.Sample`
147+
:param sample: Instance of ``Sample`` containing audio information.
148+
158149
:type language_code: str
159150
:param language_code: (Optional) The language of the supplied audio as
160151
BCP-47 language tag. Example: ``'en-GB'``.
@@ -192,10 +183,8 @@ def sync_recognize(self, content, source_uri, encoding, sample_rate,
192183
between 0 and 1.
193184
"""
194185

195-
data = _build_request_data(content, source_uri, encoding,
196-
sample_rate, language_code,
197-
max_alternatives, profanity_filter,
198-
speech_context)
186+
data = _build_request_data(sample, language_code, max_alternatives,
187+
profanity_filter, speech_context)
199188

200189
api_response = self.connection.api_request(
201190
method='POST', path='speech:syncrecognize', data=data)
@@ -206,34 +195,12 @@ def sync_recognize(self, content, source_uri, encoding, sample_rate,
206195
raise ValueError('result in api should have length 1')
207196

208197

209-
def _build_request_data(content, source_uri, encoding, sample_rate,
210-
language_code=None, max_alternatives=None,
198+
def _build_request_data(sample, language_code=None, max_alternatives=None,
211199
profanity_filter=None, speech_context=None):
212200
"""Builds the request data before making API request.
213201
214-
:type content: bytes
215-
:param content: Byte stream of audio.
216-
217-
:type source_uri: str
218-
:param source_uri: URI that points to a file that contains audio
219-
data bytes as specified in RecognitionConfig.
220-
Currently, only Google Cloud Storage URIs are
221-
supported, which must be specified in the following
222-
format: ``gs://bucket_name/object_name``.
223-
224-
:type encoding: str
225-
:param encoding: encoding of audio data sent in all RecognitionAudio
226-
messages, can be one of: :attr:`~.Encoding.LINEAR16`,
227-
:attr:`~.Encoding.FLAC`, :attr:`~.Encoding.MULAW`,
228-
:attr:`~.Encoding.AMR`, :attr:`~.Encoding.AMR_WB`
229-
230-
:type sample_rate: int
231-
:param sample_rate: Sample rate in Hertz of the audio data sent in all
232-
requests. Valid values are: 8000-48000. For best
233-
results, set the sampling rate of the audio source
234-
to 16000 Hz. If that's not possible, use the
235-
native sample rate of the audio source (instead of
236-
re-sampling).
202+
:type sample: :class:`~google.cloud.speech.sample.Sample`
203+
:param sample: Instance of ``Sample`` containing audio information.
237204
238205
:type language_code: str
239206
:param language_code: (Optional) The language of the supplied audio as
@@ -265,29 +232,13 @@ def _build_request_data(content, source_uri, encoding, sample_rate,
265232
:rtype: dict
266233
:returns: Dictionary with required data for Google Speech API.
267234
"""
268-
if content is None and source_uri is None:
269-
raise ValueError('content and source_uri cannot be both '
270-
'equal to None')
271-
272-
if content is not None and source_uri is not None:
273-
raise ValueError('content and source_uri cannot be both '
274-
'different from None')
275-
276-
if encoding is None:
277-
raise ValueError('encoding cannot be None')
278-
279-
encoding_value = getattr(Encoding, encoding)
280-
281-
if sample_rate is None:
282-
raise ValueError('sample_rate cannot be None')
283-
284-
if content is not None:
285-
audio = {'content': b64encode(_to_bytes(content))}
235+
if sample.content is not None:
236+
audio = {'content': b64encode(_to_bytes(sample.content))}
286237
else:
287-
audio = {'uri': source_uri}
238+
audio = {'uri': sample.source_uri}
288239

289-
config = {'encoding': encoding_value,
290-
'sampleRate': sample_rate}
240+
config = {'encoding': sample.encoding,
241+
'sampleRate': sample.sample_rate}
291242

292243
if language_code is not None:
293244
config['languageCode'] = language_code

0 commit comments

Comments
 (0)