Closed
Description
Is it possible for google speech API to transcribe a long audio and return the transcription with time label of each word or time label of each sentences?
If not, any idea of how to do that? For example, I have thought about slicing long audio into fixed length short segment with overlapping and transcribe those short audios and then eliminate those overlapped texts. Another possible way to do that is to break long audio into meaningful sentences using some segmentation technique and then combine them. By the way, the audio I want to transcribe is 110 minutes long sports video commentary. Transcribing the whole audio and generating a long messy paragraph is not what I want. Thanks for those who're willing to give help.