Simple web application, which can be used to convert audio to subtitles by OpenAI's Whisper model
-
Updated
Feb 7, 2025 - Python
Simple web application, which can be used to convert audio to subtitles by OpenAI's Whisper model
A desktop application that transcribes audio from files, microphone input or YouTube videos with the option to translate the content and create subtitles.
This repository contains a Python script that allows users to download the audio from a YouTube video, transcribe it into text, detect the language and save the transcription in txt file automatically.
Modern Desktop Application offering a suite of tools for audio/video text recognition and a variety of other useful utilities.
Simple Python audio transcriber using OpenAI's Whisper speech recognition model
State‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: T4 | collections: ["CTranslate2"] </metadata>
Develop a python application that allows you to extract valuable insights, engage in meaningful conversations, and explore video content in a whole new way.
An efficient desktop application for transcribing audio files into text using Vosk speech recognition.
GUI Showcase of using Whisper to transcribe and analyze Youtube video
Distilled model which is 49% smaller and 6.3× faster while maintaining near accuracy, especially on long-form transcription. <metadata> gpu: T4 | collections: ["HF Transformers"] </metadata>
An easy way to generate SRT subtitles from a video in Windows.
Instagram Reels Transcription App is a web-based application built using Streamlit that allows users to transcribe Instagram Reels into text using the AssemblyAI API. The app downloads Instagram Reels, converts them into audio, and transcribes the audio with speaker labels and timestamps.
Fine tuning Whisper-Small LLM for Hinglish Audio dataset
streamlit app to transcript audio to text using openai's whisper library
Parakeet MLX is a next-generation automatic speech recognition (ASR) engine optimized for Apple Silicon (M1/M2/M3), leveraging Apple’s MLX framework for ultra-fast, low-latency transcription. It offers real-time streaming, advanced audio processing. Including noise reduction and silence detection
A turbocharged variant of Whisper large‑v3 for English speech recognition, optimized for lower latency. <metadata> gpu: T4 | collections: ["HF Transformers","Complex Outputs"] </metadata>
Implemented some of the models and techniques learned in NLP to help build systems that help in daily life.
Add a description, image, and links to the audio-to-text topic page so that developers can more easily learn about it.
To associate your repository with the audio-to-text topic, visit your repo's landing page and select "manage topics."