-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Video dataset functionalities #1
base: video-reader
Are you sure you want to change the base?
Conversation
def __getitem__(self, idx): | ||
video, audio, info, video_idx = self.video_clips.get_clip(idx) | ||
label = self.samples[video_idx][1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add transforms yet
from torchvision.io import read_video_timestamps, read_video | ||
|
||
|
||
def unfold(tensor, size, step, dilation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses stride tricks to compute all the possible clips in the video, with potential steps between clips and dilation (steps between frames)
* adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * metadata registration works * API build next * test * Merge change * formatting parameters to avoid the segfault * next now works on a video * make size of the output tensor format dependent * Make next work on audio stream only as well * refactoring the _setCurrentStream param * Fixing the last frame return and sensor * todo docs * Formatting * cleanup and comments * introducing new tests for the API * cleanup * Comment out unnecesary format (will add following FFMPEG fix) * Reformat parsing function * removing the seek bug `get_decoder_params` * Removing unnecessary code/variables * enforce RGB24 as a reading format (will crash before ffmpeg fix) * permute the dimensions to return (RGB x H x W) * Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list * Adjusting tests for the new return type * remove unnecessary jitter * clangangangang * Metadata return changes (#1) * remove implicit calls to set a current stream (#2) * Adding new tests to check the accuracy of the seek * cleanup debugging statements * adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * adding base files * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * metadata registration works * API build next * test * Merge change * formatting parameters to avoid the segfault * next now works on a video * make size of the output tensor format dependent * Make next work on audio stream only as well * refactoring the _setCurrentStream param * Fixing the last frame return and sensor * todo docs * Formatting * cleanup and comments * introducing new tests for the API * cleanup * Comment out unnecesary format (will add following FFMPEG fix) * Reformat parsing function * removing the seek bug `get_decoder_params` * Removing unnecessary code/variables * enforce RGB24 as a reading format (will crash before ffmpeg fix) * permute the dimensions to return (RGB x H x W) * Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list * Adjusting tests for the new return type * remove unnecessary jitter * clangangangang * Metadata return changes (#1) * remove implicit calls to set a current stream (#2) * Adding new tests to check the accuracy of the seek * cleanup debugging statements * Addressing PR comments * addressing Francisco's comments * CLANG build formatting * Updated testing to test against pyav for the video tensor reads * Formatting * remove pyav from pip deps and add it to conda build * add pyav and ffmeped to conda builds * Formatting? * Setting up linter once and for all hopefully * Testing pyav * Fix to 8.0.0 * Try 6.2.0 * See what happens with av from pip * Remove FFMPEG blocker * What is going on? * More tests * Forgot something * unblocker * Check if cache is messing up with things * Now try with different ffmpeg * Now try with different ffmpeg * Testing pyav * Fix to 8.0.0 * Try 6.2.0 * See what happens with av from pip * What is going on? * More tests * Forgot something * Check if cache is messing up with things * Now try with different ffmpeg * Now try with different ffmpeg * Do not install av * Test with ffmpeg 4.2 * clean up video tests * cleaning up the tests a bit to better test partial reading * arrgh linter * Forgot the av test * forgot av test * checkout build files from master * revert circleci * addressing Franciscos comments * addressing Franciscos comments * Ignore ffmpeg in travis Co-authored-by: Francisco Massa <fvsmassa@gmail.com> Co-authored-by: Edgar Andrés Margffoy Tuay <andfoy@gmail.com>
This is a WIP prototype for now.
It's here to give some more context of how the functionality in pytorch#1039 will be used for reading video data.
It still requires documentation and tests.
cc @bjuncek @stephenyan1231 for an initial review