Description
From googleapis/google-cloud-python#13405, the response to streaming_synthesize
is headerless LINEAR16 audio with a sample rate of 24000.
. The code sample below prints the size of the audio content but does not include the necessary header to actually play the audio.
python-docs-samples/texttospeech/snippets/streaming_tts_quickstart.py
Lines 46 to 48 in 5e8e178
This may not be the purpose of the code sample, however having this extra information in the code sample will help with debugging customer issues such as googleapis/google-cloud-python#13405.
I added code which includes the raw audio header, however there is likely an easier way to achieve this. We should provide guidance on how folks should create the audio header.
# This is a raw header based on the spec at https://docs.fileformat.com/audio/wav/
header = b'RIFF\x00\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\xc0]\x00\x00\x80\xbb\x00\x00\x02\x00\x10\x00data\x00\x00\x00\x00'
total_length = 0
with open(f"output.wav", "wb") as out:
out.write(header)
for response in streaming_responses:
# calculate the length of the content
total_length += len(response.audio_content)
out.write(response.audio_content)
# Position 40 - 43: Size of the data section
out.seek(40)
out.write(bytes([total_length & 0xFF, (total_length >> 8) & 0xFF, (total_length >> 16) & 0xFF, (total_length >> 24) & 0xFF]))
import os
file_size = os.path.getsize("output.wav")
with open(f"output.wav", "r+b") as out:
# Position 4-7: Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
out.seek(4)
out.write(bytes([file_size & 0xFF, (file_size >> 8) & 0xFF, (file_size >> 16) & 0xFF, (total_length >> 24) & 0xFF]))
Activity
glasnt commentedon Jan 21, 2025
This part of this WIP PR might be similar to what you need here (possibly)
https://github.com/GoogleCloudPlatform/python-docs-samples/pull/13053/files#diff-5d664c635b2f6262b57f11d8b4d2016da17a18a41a8f57efd60d69b39c37365dR254-R272