@illyism/transcribe

Transcribe audio/video files to SRT subtitles in one command. Optimized for large files, long movies, and video editing workflows.

Quick Start

# 1. Try it instantly (no install needed)
npx @illyism/transcribe video.mp4

# 2. Set your OpenAI API key (one-time setup)
export OPENAI_API_KEY=sk-...

# 3. Transcribe anything
npx @illyism/transcribe video.mp4
npx @illyism/transcribe https://www.youtube.com/watch?v=VIDEO_ID

That's it! Get your free API key here and start transcribing.

Why Use This Instead of Whisper CLI?

While OpenAI's Whisper has multiple ways to use it, this tool provides a simpler, more convenient experience:

Feature	@illyism/transcribe	Official Whisper CLI	Local Whisper (whisper.cpp)
Setup	Zero setup with `npx`/`bunx`	Install Python package	Download models (~1-5GB)
Video Support	✅ Automatic with FFmpeg	❌ Audio only	❌ Audio only
YouTube Support	✅ Built-in	❌ Manual download	❌ Manual download
SRT Output	✅ Built-in	❌ Manual formatting	✅ Available
Processing	☁️ Cloud (fast)	☁️ Cloud (fast)	💻 Local (slower)
Cost	$0.006/min	$0.006/min	Free (after setup)
Internet Required	✅ Yes	✅ Yes	❌ No
Best For	Quick tasks, videos, YouTube	API integration	Privacy, offline use

Key Advantages

🎬 Handles videos directly - No need to manually extract audio
🎥 YouTube support - Transcribe YouTube videos with just the URL
📝 SRT format ready - Generates subtitles automatically
🚀 Zero installation - Just run npx @illyism/transcribe video.mp4
🔧 Simple config - One-time API key setup
🌐 Cross-platform - Works on macOS, Linux, Windows

Perfect for: Content creators, podcasters, and developers who need quick, accurate transcriptions with minimal setup.

Real-World Use Case

Got a 30-60 minute video that's 2-4GB? Other tools like Descript upload the entire video file, which takes forever and costs more.

This tool:

🎬 Extracts only the audio locally (takes seconds with FFmpeg)
☁️ Uploads only ~20-40MB of audio to Whisper
📝 Generates SRT subtitles

Result: 10-100x faster than uploading multi-GB video files. Same quality, fraction of the time and bandwidth.

Features

🎬 Video & Audio Support: Works with MP4, MP3, WAV, M4A, WebM, OGG, MOV, AVI, and MKV
🎥 YouTube Support: Download and transcribe YouTube videos directly
🎯 High Accuracy: Powered by OpenAI's Whisper API
⚡ Smart Optimization: Automatic 1.2x speed processing + mono/16kHz extraction (optimized for dialogue)
📝 SRT Format: Generates standard SRT subtitle files with precise timestamps
🎞️ Long Movies: Automatic chunking for feature-length content (45+ minutes)
🎬 Editor-Friendly: Timecode offset, custom output paths, chunk size control
🔧 Simple Setup: Easy configuration via environment variable or config file
🌍 Multi-language: Automatically detects language
🚀 Lightning Fast: Optimized for 2-4GB+ video files

Installation & Setup

Option 1: Use Instantly (No Install)

npx @illyism/transcribe video.mp4

Option 2: Install Globally

npm install -g @illyism/transcribe
# or: bun install -g @illyism/transcribe

Prerequisites

📦 Install FFmpeg (required)

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg

# Windows
choco install ffmpeg

🎥 Install yt-dlp (optional, for YouTube)

# macOS
brew install yt-dlp

# Ubuntu/Debian
sudo apt install yt-dlp

# Windows
winget install yt-dlp

# Or with pip
pip install yt-dlp

🔑 Get OpenAI API Key (required)

Go to platform.openai.com/api-keys
Create a new API key
Copy it and set it up below ⬇️

API Key Setup (30 seconds)

One-time setup - Choose your preferred method:

Method 1: Config File (Recommended)

mkdir -p ~/.transcribe && echo '{"apiKey": "sk-YOUR_KEY"}' > ~/.transcribe/config.json

Method 2: Environment Variable

export OPENAI_API_KEY=sk-YOUR_KEY

Don't have a key? Get one free here (takes 1 minute)

Usage Examples

# Local video file
transcribe video.mp4

# YouTube video
transcribe https://www.youtube.com/watch?v=VIDEO_ID

# Audio file
transcribe podcast.mp3

# Disable optimization (use original audio)
transcribe video.mp4 --raw

Outputs: Creates video.srt in the same directory.

Editor-Friendly Features

Perfect for video editing workflows:

# Custom output path (file or directory)
transcribe movie.mkv --output ./subtitles
transcribe movie.mkv --output ./subtitles/movie.srt

# Timecode offset (for editorial timelines)
transcribe movie.mkv --offset 01:00:00.000  # Start at 1 hour
transcribe movie.mkv --offset 3600         # Same, in seconds

# Force chunking for very long movies
transcribe long_movie.mkv --chunk-minutes 15

Why chunking? Movies 45+ minutes are automatically split into ~20-minute chunks for reliability. Each chunk is transcribed separately, then merged seamlessly with correct timestamps.

What Happens Automatically

By default, the tool optimizes large files:

2.7GB video → Extract audio (mono, 16kHz) → Speed up 1.2x → Chunk if >45min → Upload chunks → Transcribe → Merge & adjust timestamps

For long movies (45+ minutes):

Automatically splits into ~20-minute chunks
Transcribes each chunk separately
Merges results with correct timestamps
Handles 2+ hour movies reliably

Result:

⚡ 99.5% smaller uploads (2.7GB → 12.8MB)
🚀 10-100x faster than uploading full video
🎯 ~98% accuracy maintained
💰 Same cost ($0.006/min)

Want original audio? Add --raw flag.

Use as a Library

npm install @illyism/transcribe

import { transcribe } from '@illyism/transcribe'

const result = await transcribe({
  inputPath: 'video.mp4',
  apiKey: process.env.OPENAI_API_KEY,
  optimize: true // default, set false to disable
})

console.log(result.srtPath)  // Path to generated SRT file
console.log(result.text)     // Full transcription text

Full API reference

interface TranscribeOptions {
  inputPath: string        // Path to video/audio file
  apiKey?: string         // OpenAI API key (or use env var)
  outputPath?: string     // Custom output path (optional)
  optimize?: boolean      // Enable optimization (default: true)
}

interface TranscribeResult {
  srtPath: string         // Path to generated SRT file
  text: string           // Full transcription text
  language: string       // Detected language
  duration: number       // Duration in seconds
}

Details

📋 Supported Formats

Video: MP4, WebM, MOV, AVI, MKV
Audio: MP3, WAV, M4A, OGG, Opus
YouTube: All videos, Shorts, youtu.be links

💰 Cost

OpenAI Whisper API: $0.006 per minute

Examples:

5 min: $0.03
30 min: $0.18
2 hours: $0.72

⚙️ How It Works

Extract audio from video (mono, 16kHz - optimized for speech)
Optimize: 1.2x speed + compression if >24MB
Auto-chunk if >45 minutes (for reliability)
Upload chunks to Whisper API (or single file)
Generate SRT with timestamps
Merge chunks (if needed) and adjust timestamps to match original
Apply timecode offset (if specified)
Clean up temp files

📄 SRT Output Example

1
00:00:00,000 --> 00:00:03,420
Hey and thank you for getting the SEO roast.

2
00:00:03,420 --> 00:00:06,840
I'll take a look at your website and see what things we can improve.

Troubleshooting

"OPENAI_API_KEY not found"

Set up your API key using one of the methods in API Key Setup.

"FFmpeg not found"

Install FFmpeg:

brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu
choco install ffmpeg  # Windows

"yt-dlp not found" (YouTube only)

Install yt-dlp:

brew install yt-dlp  # macOS
sudo apt install yt-dlp  # Ubuntu
pip install yt-dlp  # Any platform

File not found error

Use absolute paths:

transcribe /full/path/to/video.mp4

API errors (502, timeout, etc.)

OpenAI API may be temporarily down. Wait 30 seconds and try again.

"Could not parse multipart form" error

If you're using Bun runtime, switch to Node.js:

# Use Node.js instead of Bun
node dist/cli.js video.mp4

# Or install globally and use the transcribe command
npm install -g @illyism/transcribe
transcribe video.mp4

The CLI works best with Node.js 18+ due to OpenAI SDK compatibility.

Links

Contributing

Pull requests welcome! See GitHub repo.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.cursor/rules		.cursor/rules
src		src
test		test
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PUBLISHING.md		PUBLISHING.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@illyism/transcribe

Quick Start

Why Use This Instead of Whisper CLI?

Key Advantages

Real-World Use Case

Features

Installation & Setup

Option 1: Use Instantly (No Install)

Option 2: Install Globally

Prerequisites

API Key Setup (30 seconds)

Method 1: Config File (Recommended)

Method 2: Environment Variable

Usage Examples

Editor-Friendly Features

What Happens Automatically

Use as a Library

Details

Troubleshooting

Links

Contributing

License

About

Uh oh!

Releases 7

Packages

Languages

License

Illyism/transcribe-cli

Folders and files

Latest commit

History

Repository files navigation

@illyism/transcribe

Quick Start

Why Use This Instead of Whisper CLI?

Key Advantages

Real-World Use Case

Features

Installation & Setup

Option 1: Use Instantly (No Install)

Option 2: Install Globally

Prerequisites

API Key Setup (30 seconds)

Method 1: Config File (Recommended)

Method 2: Environment Variable

Usage Examples

Editor-Friendly Features

What Happens Automatically

Use as a Library

Details

Troubleshooting

Links

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages