Skip to content

Latest commit

 

History

History
96 lines (92 loc) · 4.01 KB

whisper.mdx

File metadata and controls

96 lines (92 loc) · 4.01 KB
title description
OpenAI Whisper
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
<TextArea full align="center"> The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation. </TextArea>