feat: optional WebVTT timestamp subtitle track in exported MP4s#4766
feat: optional WebVTT timestamp subtitle track in exported MP4s#4766connortechnology wants to merge 2 commits into
Conversation
Adds a new boolean config option ZM_OPT_EXPORT_TIMESTAMP_TRACK (default off, category web). When enabled, the multi-event MP4 download path in web/includes/download_functions.php walks the sorted event list, accumulates one WebVTT cue per second mapping the concat-relative time to the source event's wall-clock StartDateTime, writes timestamps.vtt next to event_files.txt, and muxes it into the merged output as a mov_text subtitle stream alongside the existing -c copy of video and audio. Also stamps an MP4-level creation_time set to the earliest event start (UTC ISO-8601). Off behavior is byte-identical to before: same ffmpeg -f concat -safe 0 -i event_files.txt -c copy <out>.mp4. Helpers formatVttTimestamp / buildVttContent / writeVttFile added at the bottom of download_functions.php and unit-smoke-tested via php -r. Falls back to (EndDateTime - StartDateTime) when an event's Length is null/0; if neither is usable the cues for that event are skipped with a debug log so the export still completes. The subtitle stream is intended as a machine-readable record of capture time per second, replacing the need to OCR the burned-in OSD timestamp. mov_text is rendered by VLC and Safari and ignored by browsers' <video> element without an explicit <track>; tools can extract cues with `ffmpeg -map 0:s -f srt -` or read them via `ffprobe -show_streams`. refs #4761 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in mechanism for merged MP4 exports to include a machine-readable wall-clock timestamp subtitle track, enabling timestamp recovery without OCR.
Changes:
- Adds
ZM_OPT_EXPORT_TIMESTAMP_TRACK(web category) configuration option (default off). - Extends merged MP4 export path to generate a per-second WebVTT cue list and mux it into the output as a
mov_textsubtitle stream, plus sets MP4creation_timemetadata. - Adds helper functions to format and write the WebVTT file used during muxing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
web/includes/download_functions.php |
Generates WebVTT cues during merged export, writes timestamps.vtt, and conditionally muxes it into the output MP4 with ffmpeg. |
scripts/ZoneMinder/lib/ZoneMinder/ConfigData.pm.in |
Registers new boolean config option controlling whether timestamp subtitle track is embedded. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ($maxTimeSecs == -1 or $maxTimeSecs < $event->StartDateTimeSecs()) { | ||
| $maxTimeSecs = $event->EndDateTimeSecs(); | ||
| $maxTime = $event->EndDateTime(); |
There was a problem hiding this comment.
$maxTimeSecs is updated based on comparing against the event start time ($event->StartDateTimeSecs()), but then assigned the event end time. If events overlap (or if an earlier event ends after a later event starts), this can prevent $maxTimeSecs/$maxTime from ever updating to the true latest end time, yielding an incorrect filename range (and any future uses of $maxTimeSecs). Compare against $event->EndDateTimeSecs() instead (and ensure EndDateTimeSecs is non-null) so the max reflects the latest end timestamp.
There was a problem hiding this comment.
The bug is real, but it pre-exists on master (git show master:web/includes/download_functions.php shows the same < $event->StartDateTimeSecs() comparison at the equivalent location) and isn't touched by this PR. Keeping the change set surgical here — filing as a separate follow-up issue with a test against an overlapping / non-monotonic-end event set.
| if ($seconds < 0) $seconds = 0; | ||
| $h = (int)floor($seconds / 3600); | ||
| $m = (int)floor(($seconds - $h * 3600) / 60); | ||
| $s = $seconds - $h * 3600 - $m * 60; | ||
| $whole = (int)floor($s); | ||
| $ms = (int)round(($s - $whole) * 1000); | ||
| if ($ms >= 1000) { $ms = 0; $whole += 1; } | ||
| return sprintf('%02d:%02d:%02d.%03d', $h, $m, $whole, $ms); |
There was a problem hiding this comment.
formatVttTimestamp() can emit invalid VTT timestamps with seconds == 60 (or minutes/hours overflow) due to rounding: e.g. if $seconds is 59.9996, $ms rounds to 1000, then $whole is incremented to 60 without normalizing into minutes/hours. This is plausible here because cue times are floats ($concatOffset + $duration) and can accumulate floating point error. Consider converting to integer milliseconds first (e.g. max(0, (int)round($seconds*1000))) and then deriving h/m/s/ms via integer division/modulo so carry is handled correctly.
| if ($seconds < 0) $seconds = 0; | |
| $h = (int)floor($seconds / 3600); | |
| $m = (int)floor(($seconds - $h * 3600) / 60); | |
| $s = $seconds - $h * 3600 - $m * 60; | |
| $whole = (int)floor($s); | |
| $ms = (int)round(($s - $whole) * 1000); | |
| if ($ms >= 1000) { $ms = 0; $whole += 1; } | |
| return sprintf('%02d:%02d:%02d.%03d', $h, $m, $whole, $ms); | |
| $totalMs = max(0, (int)round($seconds * 1000)); | |
| $h = (int)floor($totalMs / 3600000); | |
| $remainder = $totalMs % 3600000; | |
| $m = (int)floor($remainder / 60000); | |
| $remainder = $remainder % 60000; | |
| $s = (int)floor($remainder / 1000); | |
| $ms = $remainder % 1000; | |
| return sprintf('%02d:%02d:%02d.%03d', $h, $m, $s, $ms); |
There was a problem hiding this comment.
Fixed in 0171f51. Reproduced: formatVttTimestamp(59.9996) returned 00:00:60.000 (invalid). Switched to integer-millisecond arithmetic with floor/mod for h/m/s/ms so carry propagates beyond seconds. Verified: 59.9996 → 00:01:00.000, 3599.9996 → 01:00:00.000, -0.5 → 00:00:00.000.
formatVttTimestamp() could emit invalid VTT timestamps such as 00:00:60.000 when the input second value (a float accumulated from $concatOffset + $duration) was very close to a minute/hour boundary and rounded ms up to 1000. The local +1 carry only handled seconds, not minutes/hours. Convert to integer milliseconds first and derive h/m/s/ms via integer division/modulo so carry propagates correctly. Verified 59.9996 -> 00:01:00.000, 3599.9996 -> 01:00:00.000, negative inputs clamp to zero. refs #4761
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'text' => date('Y-m-d H:i:s', $eventStart + $s), | ||
| ]; | ||
| } | ||
| if ($duration - $whole > 0.001) { | ||
| $vttCues[] = [ | ||
| 'start' => $concatOffset + $whole, | ||
| 'end' => $concatOffset + $duration, | ||
| 'text' => date('Y-m-d H:i:s', $eventStart + $whole), |
There was a problem hiding this comment.
The cue text timestamps are generated with date(...) (local timezone and no offset), while creation_time is written in UTC (gmdate(...)). This makes the exported timestamps ambiguous and potentially inconsistent across systems/timezone changes. Consider including an explicit timezone in cue text (e.g., UTC with gmdate or local with offset) so consumers can interpret timestamps unambiguously.
| 'text' => date('Y-m-d H:i:s', $eventStart + $s), | |
| ]; | |
| } | |
| if ($duration - $whole > 0.001) { | |
| $vttCues[] = [ | |
| 'start' => $concatOffset + $whole, | |
| 'end' => $concatOffset + $duration, | |
| 'text' => date('Y-m-d H:i:s', $eventStart + $whole), | |
| 'text' => gmdate('Y-m-d H:i:s \U\T\C', $eventStart + $s), | |
| ]; | |
| } | |
| if ($duration - $whole > 0.001) { | |
| $vttCues[] = [ | |
| 'start' => $concatOffset + $whole, | |
| 'end' => $concatOffset + $duration, | |
| 'text' => gmdate('Y-m-d H:i:s \U\T\C', $eventStart + $whole), |
| 'text' => date('Y-m-d H:i:s', $eventStart + $s), | ||
| ]; | ||
| } | ||
| if ($duration - $whole > 0.001) { | ||
| $vttCues[] = [ | ||
| 'start' => $concatOffset + $whole, | ||
| 'end' => $concatOffset + $duration, | ||
| 'text' => date('Y-m-d H:i:s', $eventStart + $whole), |
There was a problem hiding this comment.
The cue text timestamps are generated with date(...) (local timezone and no offset), while creation_time is written in UTC (gmdate(...)). This makes the exported timestamps ambiguous and potentially inconsistent across systems/timezone changes. Consider including an explicit timezone in cue text (e.g., UTC with gmdate or local with offset) so consumers can interpret timestamps unambiguously.
| 'text' => date('Y-m-d H:i:s', $eventStart + $s), | |
| ]; | |
| } | |
| if ($duration - $whole > 0.001) { | |
| $vttCues[] = [ | |
| 'start' => $concatOffset + $whole, | |
| 'end' => $concatOffset + $duration, | |
| 'text' => date('Y-m-d H:i:s', $eventStart + $whole), | |
| 'text' => gmdate('Y-m-d H:i:s \U\T\C', $eventStart + $s), | |
| ]; | |
| } | |
| if ($duration - $whole > 0.001) { | |
| $vttCues[] = [ | |
| 'start' => $concatOffset + $whole, | |
| 'end' => $concatOffset + $duration, | |
| 'text' => gmdate('Y-m-d H:i:s \U\T\C', $eventStart + $whole), |
| $vttCues = []; | ||
| $concatOffset = 0.0; |
There was a problem hiding this comment.
When enabled, this builds an in-memory $vttCues array with one entry per second and then concatenates the entire WebVTT into a single string. For long exports, this can cause significant memory growth and slow string concatenation. A more scalable approach would be to stream cues directly to a file handle (write header once, then fwrite per cue) or generate/write cues per event rather than accumulating them all in arrays/strings.
| for ($s = 0; $s < $whole; $s++) { | ||
| $vttCues[] = [ | ||
| 'start' => $concatOffset + $s, | ||
| 'end' => $concatOffset + $s + 1, | ||
| 'text' => date('Y-m-d H:i:s', $eventStart + $s), | ||
| ]; | ||
| } |
There was a problem hiding this comment.
When enabled, this builds an in-memory $vttCues array with one entry per second and then concatenates the entire WebVTT into a single string. For long exports, this can cause significant memory growth and slow string concatenation. A more scalable approach would be to stream cues directly to a file handle (write header once, then fwrite per cue) or generate/write cues per event rather than accumulating them all in arrays/strings.
| function buildVttContent($cues) { | ||
| $out = "WEBVTT\n\n"; | ||
| foreach ($cues as $cue) { | ||
| if ($cue['end'] <= $cue['start']) continue; | ||
| $out .= formatVttTimestamp($cue['start']).' --> '.formatVttTimestamp($cue['end'])."\n"; | ||
| $out .= $cue['text']."\n\n"; | ||
| } | ||
| return $out; | ||
| } | ||
|
|
||
| function writeVttFile($path, $cues) { | ||
| $content = buildVttContent($cues); | ||
| return @file_put_contents($path, $content); |
There was a problem hiding this comment.
When enabled, this builds an in-memory $vttCues array with one entry per second and then concatenates the entire WebVTT into a single string. For long exports, this can cause significant memory growth and slow string concatenation. A more scalable approach would be to stream cues directly to a file handle (write header once, then fwrite per cue) or generate/write cues per event rather than accumulating them all in arrays/strings.
| function buildVttContent($cues) { | |
| $out = "WEBVTT\n\n"; | |
| foreach ($cues as $cue) { | |
| if ($cue['end'] <= $cue['start']) continue; | |
| $out .= formatVttTimestamp($cue['start']).' --> '.formatVttTimestamp($cue['end'])."\n"; | |
| $out .= $cue['text']."\n\n"; | |
| } | |
| return $out; | |
| } | |
| function writeVttFile($path, $cues) { | |
| $content = buildVttContent($cues); | |
| return @file_put_contents($path, $content); | |
| function writeVttStream($stream, $cues) { | |
| $bytesWritten = fwrite($stream, "WEBVTT\n\n"); | |
| if ($bytesWritten === false) { | |
| return false; | |
| } | |
| foreach ($cues as $cue) { | |
| if ($cue['end'] <= $cue['start']) continue; | |
| $cueText = formatVttTimestamp($cue['start']).' --> '.formatVttTimestamp($cue['end'])."\n"; | |
| $cueText .= $cue['text']."\n\n"; | |
| $result = fwrite($stream, $cueText); | |
| if ($result === false) { | |
| return false; | |
| } | |
| $bytesWritten += $result; | |
| } | |
| return $bytesWritten; | |
| } | |
| function buildVttContent($cues) { | |
| $stream = fopen('php://temp/maxmemory:1048576', 'w+b'); | |
| if ($stream === false) { | |
| return false; | |
| } | |
| $result = writeVttStream($stream, $cues); | |
| if ($result === false) { | |
| fclose($stream); | |
| return false; | |
| } | |
| rewind($stream); | |
| $content = stream_get_contents($stream); | |
| fclose($stream); | |
| return $content; | |
| } | |
| function writeVttFile($path, $cues) { | |
| $stream = @fopen($path, 'wb'); | |
| if ($stream === false) { | |
| return false; | |
| } | |
| $result = writeVttStream($stream, $cues); | |
| fclose($stream); | |
| return $result; |
|
|
||
| function writeVttFile($path, $cues) { | ||
| $content = buildVttContent($cues); | ||
| return @file_put_contents($path, $content); |
There was a problem hiding this comment.
Using the error suppression operator (@file_put_contents) can make failures harder to diagnose (it hides warnings like permission or disk-full issues). Prefer calling file_put_contents without @ and, on failure, logging the underlying error (e.g., via error_get_last()) so operations/debugging have actionable detail.
| return @file_put_contents($path, $content); | |
| $result = file_put_contents($path, $content); | |
| if ($result === false) { | |
| $lastError = error_get_last(); | |
| if ($lastError and isset($lastError['message'])) { | |
| ZM\Error("Failed to write VTT file '$path': ".$lastError['message']); | |
| } else { | |
| ZM\Error("Failed to write VTT file '$path'"); | |
| } | |
| } | |
| return $result; |
Adds a new boolean config option ZM_OPT_EXPORT_TIMESTAMP_TRACK (default off, category web). When enabled, the multi-event MP4 download path in web/includes/download_functions.php walks the sorted event list, accumulates one WebVTT cue per second mapping the concat-relative time to the source event's wall-clock StartDateTime, writes timestamps.vtt next to event_files.txt, and muxes it into the merged output as a mov_text subtitle stream alongside the existing -c copy of video and audio. Also stamps an MP4-level creation_time set to the earliest event start (UTC ISO-8601). Off behavior is byte-identical to before: same ffmpeg -f concat -safe 0 -i event_files.txt -c copy .mp4.
Helpers formatVttTimestamp / buildVttContent / writeVttFile added at the bottom of download_functions.php and unit-smoke-tested via php -r. Falls back to (EndDateTime - StartDateTime) when an event's Length is null/0; if neither is usable the cues for that event are skipped with a debug log so the export still completes.
The subtitle stream is intended as a machine-readable record of capture time per second, replacing the need to OCR the burned-in OSD timestamp. mov_text is rendered by VLC and Safari and ignored by browsers'
refs #4761