A NodeJS-based tool for downloading transcripts from Udemy courses. This script uses Puppeteer to navigate through Udemy's UI and extract transcripts for each lecture in a course.
- Downloads transcripts from any Udemy course you have access to
- Creates individual transcript files for each lecture
- Generates a combined transcript file with all lectures
- Optionally downloads
.srt
files with timestamps for each lecture - Scrapes and saves course content structure
- Supports email-based authentication with verification code
- Handles Cloudflare security challenges
- Runs in headless mode for better performance
- Node.js (v14 or newer)
- NPM
- A Udemy account with access to the course you want to download transcripts from
-
Clone this repository:
git clone https://github.com/TOA-Anakin/udemy-transcript-downloader.git cd udemy-transcript-downloader
-
Install dependencies:
npm install
-
Create a
.env
file in the root directory with your Udemy email:UDEMY_EMAIL=your-email@example.com
Run the script with the URL of the Udemy course as an argument:
npm start "https://www.udemy.com/course/your-course-url/"
Or use the direct Node.js command:
node src/index.js "https://www.udemy.com/course/your-course-url/"
The script will:
- Ask if you want to download
.srt
files (with timestamps) for each lecture - Ask how many tabs to use for downloading transcripts (default is 5)
- A higher number can speed things up, but requires a good PC (enough CPU and RAM)
- Open a headless browser and navigate to Udemy login
- Fill in your email from the .env file
- Ask you to enter the 6-digit verification code from your email
- Navigate to the course page
- Scrape course content structure
- Enter the course player
- Go through each lecture and download available transcripts
- Save individual transcript files in the
output
directory
All output files are saved to the output
directory:
CONTENTS.txt
- Course structure with sections and lectures[Lecture Name].txt
- Individual transcript files for each lecture[Lecture Name].srt
- Individual transcript files with timestamps in SubRip format (optional)
- Verification Code Issues: Make sure to enter the verification code quickly after receiving it in your email
- Browser Crashing: If you experience issues with headless mode, you can modify the script to use
headless: false
for debugging - Missing Transcripts: Not all lectures may have transcripts. The script will create empty files for lectures without transcripts.
- SRT Errors: If
.srt
generation fails for a lecture, try increasing timeouts or re-running the script with fewer browser tabs open. - Slow Transcript Downloads: The script can download transcripts in parallel using multiple browser tabs. If your PC is slow or has limited memory, stick to a lower number of tabs (e.g. 1–3). If you have a powerful machine, you can safely use 5 or more tabs for faster processing.
MIT
This tool is for personal use only. Please respect Udemy's terms of service.