Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs to Transcribe Streaming Audio from Microphone and Performing Speech Recognition for Speech v2 API #11389

Open
rabiaedayilmaz opened this issue Apr 2, 2024 · 2 comments
Assignees
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@rabiaedayilmaz
Copy link

I searched all over the internet but all I could find people that have same problems with me. Recently, speech v2 is released and there sample codes for various tasks. The most relevant sample is streaming speech recognition on a local file.

Whenever I try to implement for microphone, like we did in speech_v1p1beta1, an error occurs. The last error I stuck on is:
Google Speech Error: 400 Audio chunk can be of a a maximum of 25600 bytes. Received audio of 253952 bytes instead.

I assume it occurs because I can not define and split into chunk size for incoming microphone audio.

There is a need for Streaming Audio from Microphone and Performing Speech Recognition for Speech v2 API sample code in docs.

@rabiaedayilmaz rabiaedayilmaz added priority: p3 Desirable enhancement or fix. May not be included in next release. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Apr 2, 2024
@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Apr 2, 2024
@SchulerSimon
Copy link

I have the same issue with speech-to-text-v2. I'll try to provide a bit more context:

I have multiple IoT-Devices at different places. Some work, some don't. I have no Idea why, or what's the difference. Software and Hardware are the same on all devices.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 173, in error_remapped_callable
    return _StreamingResponseIterator(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 95, in __init__
    self._stored_first_result = next(self._wrapped)
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 540, in __next__
    return self._next()
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 966, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:<REDACTED> {created_time:"2024-04-03T13:04:32.515940442+02:00", grpc_status:3, grpc_message:"Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead."}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "speech_2_text.py", line 153, in run
    self.responses = self.client.streaming_recognize(
  File "/usr/local/lib/python3.10/dist-packages/google/cloud/speech_v2/services/speech/client.py", line 1884, in streaming_recognize
    response = rpc(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 372, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 207, in retry_target
    result = target()
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 177, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InvalidArgument: 400 Audio chunk can be of a a maximum of 25600 bytes. Received audio of 98964 bytes instead.

Note: I removed the IPv6 from the error-message.

pip3 freeze | grep google:

google-api-core==2.15.0
google-auth==2.25.2
google-cloud-speech==2.25.1
google-cloud-texttospeech==2.15.0
googleapis-common-protos==1.62.0

I happened to have this same problem with google-cloud-speech==2.23.0 as well.

As by the examples, I feed audio-data via

def generator(self):
        """acts as a blocking generator for buffered audio_data
        when no data is there, the generator blocks till there is new data

        this generator uses queue.Queue, thus it is thread-safe

        Yields:
            bytes: the buffered audio
        """
        while not self.closed:
            # use blocking get
            chunk = self._buff.get()
            # return when stop signal detected (None)
            if chunk is None:
                return
            data = [chunk]

            # consume the rest of the queue
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            # yield result
            yield b"".join(data)

@SchulerSimon
Copy link

The Documentation here states, that 25 KB is the maximum.

I attempted a fix:

            # yield result 
            bytes_chunk = b"".join(data)
            for chunk in [bytes_chunk[x:x+25600] for x in range(0, len(bytes_chunk), 25600)]:
                yield chunk

Does get rid of this exact error, but then we just get another error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 173, in error_remapped_callable
    return _StreamingResponseIterator(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 95, in __init__
    self._stored_first_result = next(self._wrapped)
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 540, in __next__
    return self._next()
  File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 966, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.CANCELLED
        details = "The operation was cancelled."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:<REDACTED> {created_time:"2024-04-04T10:01:14.580325845+02:00", grpc_status:1, grpc_message:"The operation was cancelled."}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "speech_2_text.py", line 155, in run
    self.responses = self.client.streaming_recognize(
  File "/usr/local/lib/python3.10/dist-packages/google/cloud/speech_v2/services/speech/client.py", line 1884, in streaming_recognize
    response = rpc(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 372, in retry_wrapped_func
    return retry_target(
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/retry.py", line 207, in retry_target
    result = target()
  File "/usr/local/lib/python3.10/dist-packages/google/api_core/grpc_helpers.py", line 177, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.Cancelled: 499 The operation was cancelled.

Note: I removed the IPv6 from the error-message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. samples Issues that are directly related to samples. triage me I really want to be triaged. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants