Skip to content

FEATURE: Add V_MPEG2 track support in MKV demuxer for CC extraction#2152

Merged
cfsmp3 merged 3 commits intoCCExtractor:masterfrom
Varadraj75:feat/mkv-mpeg2-cc-extraction
Feb 28, 2026
Merged

FEATURE: Add V_MPEG2 track support in MKV demuxer for CC extraction#2152
cfsmp3 merged 3 commits intoCCExtractor:masterfrom
Varadraj75:feat/mkv-mpeg2-cc-extraction

Conversation

@Varadraj75
Copy link
Copy Markdown
Contributor

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Summary

CCExtractor's MKV demuxer only recognized V_MPEG4/ISO/AVC and V_MPEGH/ISO/HEVC
tracks, silently skipping V_MPEG2 tracks. MKV files with MPEG-2 video — common
in DVD-sourced content — produced no output at all.

Reported in #2149 by a user trying to extract CC3 captions from a Fairly OddParents DVD.

Changes

  • Add mpeg2_codec_id = "V_MPEG2" to matroska.h
  • Add mpeg2_track_number field to matroska_ctx struct
  • Detect V_MPEG2 track during track entry parsing alongside AVC/HEVC
  • Add process_mpeg2_frame_mkv() reusing the existing process_m2v()
    infrastructure (same path used by mp4.c and general_loop.c)
  • Dispatch MPEG2 frames in parse_simple_block()
  • Initialize, track and report mpeg2_track_number alongside AVC/HEVC

Testing

Tested with the sample from #2149 (cc3.mkv, V_MPEG2, captions on CC3/Field 2):

# Before: no output
ccextractor cc3.mkv --output-field 2   →  "Found no AVC/HEVC track. No captions found."

# After: captions extracted correctly
ccextractor cc3.mkv --output-field 2   →  "Found MPEG2 track." + full SRT output

Closes #2149

MKV files with MPEG-2 video (common in DVD sources) were silently skipped.
Add V_MPEG2 track detection and processing using the existing process_m2v()
infrastructure, matching how mp4.c handles MPEG-2 streams.

Fixes CCExtractor#2149
Copilot AI review requested due to automatic review settings February 28, 2026 16:15
@Varadraj75 Varadraj75 changed the title [FEATURE] Add V_MPEG2 track support in MKV demuxer for CC extraction FEATURE: Add V_MPEG2 track support in MKV demuxer for CC extraction Feb 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Matroska (MKV) demuxer support for V_MPEG2 video tracks so CCExtractor can extract EIA-608/708 captions from MPEG-2-in-MKV content (e.g., DVD-sourced rips), addressing the “no output” scenario reported in #2149.

Changes:

  • Add "V_MPEG2" codec-id recognition and track-number tracking in the Matroska parser.
  • Dispatch MPEG-2 SimpleBlock frames through a new process_mpeg2_frame_mkv() that reuses the existing MPEG-2 elementary stream processing path (process_m2v()).
  • Update Matroska loop reporting to include MPEG-2 track detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/lib_ccx/matroska.h Adds MPEG-2 codec id constant, a new mpeg2_track_number field, and a new frame-processing function prototype.
src/lib_ccx/matroska.c Detects MPEG-2 tracks, dispatches MPEG-2 frames for processing, initializes/report MPEG-2 track presence, and adjusts return behavior.
Comments suppressed due to low confidence (2)

src/lib_ccx/matroska.c:699

  • In the non-video-track fast-path, skip_bytes(file, len - 1) assumes the TrackNumber VINT is always 1 byte. Track numbers in Matroska are variable-length VINTs, so this can seek to the wrong position and desynchronize parsing. Compute the remaining bytes to skip based on the block start (pos) and current file position instead of subtracting 1.
	int is_mpeg2 = (track == mkv_ctx->mpeg2_track_number);
	if (!is_avc && !is_hevc && !is_mpeg2)
	{
		// Skip everything except AVC/HEVC tracks
		skip_bytes(file, len - 1); // 1 byte for track
		return;

src/lib_ccx/matroska.c:699

  • The comment says "Skip everything except AVC/HEVC tracks" but this block now also allows MPEG-2. Please update the comment to match the behavior so future changes don’t reintroduce accidental skips.
	{
		// Skip everything except AVC/HEVC tracks
		skip_bytes(file, len - 1); // 1 byte for track
		return;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 733ed89...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passed completely.

Check the result page for more info.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 733ed89...:
Report Name Tests Passed
Broken 13/13
CEA-708 14/14
DVB 7/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 86/86
Teletext 21/21
WTV 13/13
XDS 34/34

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

All tests passed completely.

Check the result page for more info.

@cfsmp3 cfsmp3 merged commit 5de265d into CCExtractor:master Feb 28, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to extract CC3 closed captions

4 participants