Skip to content

Optimize MP4 forensic replay export (~12x faster, idle skip)#510

Open
jmvermeulen wants to merge 3 commits intoGoSecure:mainfrom
jmvermeulen:feature/forensic-replay-improvements
Open

Optimize MP4 forensic replay export (~12x faster, idle skip)#510
jmvermeulen wants to merge 3 commits intoGoSecure:mainfrom
jmvermeulen:feature/forensic-replay-improvements

Conversation

@jmvermeulen
Copy link

Summary

  • Reduce per-PDU frame encoding overhead by batching renders and advancing PTS for idle gaps
  • Replace per-frame QImage.scaled() with 1px padding for odd H264 dimensions
  • Switch to ultrafast h264 preset, 10 FPS, and seekable non-fragmented MP4
  • Add --idle-skip N option to compress periods without user input, producing shorter review videos

Problem

Converting large .pyrdp replay files to MP4 was extremely slow. The bottleneck was encoding an H264 frame on every single PDU — even when multiple PDUs fell within the same frame window, and even during long idle gaps where the screen didn't change.

Changes

Commit 1: Frame batching + PTS advance + ultrafast preset

  • onFinishRender() now sets a dirty flag instead of immediately encoding
  • onPDUReceived() only encodes at 100ms frame boundaries when dirty=True
  • Idle gaps advance PTS without encoding (player holds last frame)
  • FPS reduced from 30 to 10 (forensic captures don't need 30fps)
  • H264 preset changed to ultrafast
  • movflags changed from frag_keyframe+empty_moov to faststart for proper seeking
  • GOP size set to 5s for reliable timeline navigation

Commit 2: Pad instead of scale for odd dimensions

  • H264 requires even dimensions. The old code called QImage.scaled() every frame — profiling showed this cost ~6ms/frame (11% of encode time)
  • Now creates an even-sized surface and draws the screen into it via drawImage, avoiding the expensive rescale

Commit 3: Idle skip (--idle-skip N)

  • Adds --idle-skip N CLI option to compress periods of user inactivity longer than N seconds
  • Idle detection is based on absence of FAST_PATH_INPUT PDUs (keyboard/mouse), not screen changes — so cursor blinks and screen refreshes don't reset the idle timer
  • During idle: PTS is frozen, screen updates are discarded (one frame encoded every 10s to capture state changes)
  • When input resumes: a 1-second pause is inserted and normal encoding continues
  • True gaps (no PDUs at all) are also compressed via the existing PTS advance path
  • Disabled by default (0 = off); --idle-skip 30 is a good starting point for forensic review

Profiling methodology

Benchmarked against clean main on a real multi-hour forensic capture (20K PDUs, wide-format resolution).

PDU type analysis revealed 96.3% GDI drawing orders, 0% bitmap updates — the bottleneck was not bitmap decompression but frame encoding.

cProfile breakdown (500 PDUs in the heaviest section):

Component Before After Improvement
writeFrame 23.3s (87%) 11.5s (79%) -51%
QImage.scaled / pad 2.9s (11%) 0.97s (7%) -67%
GDI rendering 0.2s (1%) 0.2s (1%)

End-to-end throughput (2000 PDUs, vs clean main):

Version PDU/s vs main
main (30fps, per-PDU encode, default preset) 15
Optimized (batching + ultrafast + 10fps + padding) 185 +1133%

Frame encode statistics (full file):

  • 18% fewer actual frame encodes via batching
  • 89% of video frames skipped via PTS advance (idle gaps)
  • Output video duration matches capture duration

Test plan

  • Benchmark against clean main version (12x faster)
  • Profile per-PDU encoding cost with cProfile
  • Verify frame batching reduces actual encodes
  • Verify PTS timing produces correct video duration
  • Verify output MP4 is seekable (tested faststart, frag_keyframe, frag+faststart — only faststart works correctly)
  • Verify mouse cursor renders at correct position
  • Test with odd-dimension capture (padding correctness)
  • Convert without --idle-skip produces identical output to previous behavior
  • Convert with --idle-skip produces shorter video, skipping input-idle periods
  • Verify idle detection ignores cursor blinks (FAST_PATH_OUTPUT) and triggers on keyboard/mouse absence (FAST_PATH_INPUT)

- Reduce FPS 30→10 for forensic captures (sufficient for review)
- Batch multiple renders per frame window instead of encoding per-PDU
- Advance PTS to skip idle gaps without encoding duplicate still frames
- Switch to h264 ultrafast preset for faster encoding
- Use faststart movflag for seekable output
- Set GOP size to 5s for reliable timeline seeking
H264 requires even dimensions. Previously the full surface was scaled
every frame to add 1px — profiling showed this cost ~6ms per frame
(11% of encode time). Now pads the output surface by 1px via drawImage
instead, avoiding the expensive rescale.
Detect idle periods by absence of FAST_PATH_INPUT PDUs (keyboard/mouse)
instead of screen changes. Cursor blinks and screen refreshes no longer
prevent idle detection. During idle, PTS is frozen with one frame
encoded every 10s for state capture. True PDU gaps are also compressed.
@jmvermeulen
Copy link
Author

Tested at a 4 hour recording since this was almost impossible to convert without these fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant