Skip to content

Conversation

@Akshat-Tripathi
Copy link

Currently the reference implementation saves the model output directly. This is a 81x720x1280x3 sized fp32 numpy array, which means each video takes up 895795200 bytes or 896MB.

With 247 videos this would require 221GB to store the result of an accuracy run.

However, if we first encode the video to mp4 and save the mp4 bytes, the required storage will drastically shrink to ~900MB.

To ensure fairness I propose that all submitters use the same implementation to save their video, namely diffusers.utils.export_to_video

@Akshat-Tripathi Akshat-Tripathi requested a review from a team as a code owner January 21, 2026 11:20
@github-actions
Copy link
Contributor

github-actions bot commented Jan 21, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@Akshat-Tripathi
Copy link
Author

recheck

@Akshat-Tripathi
Copy link
Author

recheck

@pgmpablo157321 pgmpablo157321 force-pushed the t2v_storage_optimisation branch from de74cef to 8351d5e Compare January 26, 2026 16:44
@pgmpablo157321 pgmpablo157321 force-pushed the t2v_storage_optimisation branch from 18fdc70 to f2d04b6 Compare January 26, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants