Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vpdq] Refactor video hashing #1590

Merged
merged 1 commit into from
May 23, 2024
Merged

Conversation

ianwal
Copy link
Contributor

@ianwal ianwal commented May 20, 2024

Summary

Refactor video hashing. The combined class of hashing and decoding frames was complicated and difficult to understand. This PR splits it up into smaller classes, which also allows different frame types to be used in the future with minimal changes.

  • Moved some FFmpeg custom types into separate files for better readability.

  • Hashing is split into two new classes: VpdqHasher and FFmpegHasher.

    • VpdqHasher takes care of all hashing operations and just runs the generic hashing algorithm as frames are added.
      • A template is used as the frame type in VpdqHasher as opposed to an abstract class with derived frame types to avoid needing to dynamically allocate every frame. A generic frame is added that works without FFmpeg as an example, but is not used for anything currently. Using the direct FFmpeg data pointer is significantly faster than copying the data into the generic frame buffer.
    • New class FFmpegHasher that feeds FFmpeg frames to the VpdqHasher.
  • Changed FFmpegHasher::processPacket to take in the data by reference instead of the full unique_ptr.

    • This is recommended by Herb Sutter GotW 91 if ownership is not being transferred.
  • Minor change to matchTwoHashBrute to pass data by const& instead of copying for better performance.

I did not notice a performance impact from these changes across multiple runs of regtest.

No public API changes other than matchTwoHashBrute.

Most of these changes are just moving common functions into grouped common modules. No major logic or anything is rewritten. I would try to split up this PR but most of it is easier to understand in the context of the full changes.

Test Plan

regtest passes successfully.

@Dcallies
Copy link
Contributor

Thanks again @ianwal!

Will try and review this later today!

Just a gut check on your end, do you think the existing regtests are reasonably comprehensive, or should we consider expanding the suite in the future? If so, what kind of changes would make you confident for refactors like this one?

@Dcallies Dcallies self-assigned this May 20, 2024
@ianwal
Copy link
Contributor Author

ianwal commented May 22, 2024

Just a gut check on your end, do you think the existing regtests are reasonably comprehensive, or should we consider expanding the suite in the future? If so, what kind of changes would make you confident for refactors like this one?

I trust the existing regtest for testing the "all good" cases, but more sample files with different codecs and containers and some bad data would be useful in general. Also unit testing and static analysis couldn't hurt. If this was a directly consumable CPP library, which would be very cool but somewhat impractical due to the pains of CPP package management, then I would definitely want some unit tests for public APIs across supported platforms.

For this PR specifically, this is almost entirely just moving stuff around and organizing so I think the regtest is good enough. And with how good the sanitizers are I would be surprised if there were any concurrency issues in the good cases.

@Dcallies
Copy link
Contributor

Dcallies commented May 23, 2024

Sorry, I left you hanging! Will attempt to complete review tomorrow.

Copy link
Contributor

@Dcallies Dcallies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for the explanation, and looks good to go.

@Dcallies Dcallies merged commit 7b2e51c into facebook:main May 23, 2024
9 checks passed
@ianwal ianwal deleted the vpdq-queue-refactor branch May 24, 2024 04:34
@ianwal ianwal mentioned this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants