[WIP] Video dataset functionalities #1

fmassa · 2019-06-28T16:58:23Z

This is a WIP prototype for now.

It's here to give some more context of how the functionality in pytorch#1039 will be used for reading video data.

It still requires documentation and tests.

cc @bjuncek @stephenyan1231 for an initial review

fmassa · 2019-06-28T16:59:04Z

torchvision/datasets/kinetics.py

+    def __getitem__(self, idx):
+        video, audio, info, video_idx = self.video_clips.get_clip(idx)
+        label = self.samples[video_idx][1]
+


need to add transforms yet

fmassa · 2019-06-28T16:59:52Z

torchvision/datasets/video_utils.py

+from torchvision.io import read_video_timestamps, read_video
+
+
+def unfold(tensor, size, step, dilation):


this uses stride tricks to compute all the possible clips in the video, with potential steps between clips and dilation (steps between frames)

* adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * metadata registration works * API build next * test * Merge change * formatting parameters to avoid the segfault * next now works on a video * make size of the output tensor format dependent * Make next work on audio stream only as well * refactoring the _setCurrentStream param * Fixing the last frame return and sensor * todo docs * Formatting * cleanup and comments * introducing new tests for the API * cleanup * Comment out unnecesary format (will add following FFMPEG fix) * Reformat parsing function * removing the seek bug `get_decoder_params` * Removing unnecessary code/variables * enforce RGB24 as a reading format (will crash before ffmpeg fix) * permute the dimensions to return (RGB x H x W) * Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list * Adjusting tests for the new return type * remove unnecessary jitter * clangangangang * Metadata return changes (#1) * remove implicit calls to set a current stream (#2) * Adding new tests to check the accuracy of the seek * cleanup debugging statements * adding base files * setup modification to actually build the thing * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * adding base files * video api constructor registration * FAIL metadata * FAIL update for QS * revert * debugging with Victor * metadata registration works * API build next * test * Merge change * formatting parameters to avoid the segfault * next now works on a video * make size of the output tensor format dependent * Make next work on audio stream only as well * refactoring the _setCurrentStream param * Fixing the last frame return and sensor * todo docs * Formatting * cleanup and comments * introducing new tests for the API * cleanup * Comment out unnecesary format (will add following FFMPEG fix) * Reformat parsing function * removing the seek bug `get_decoder_params` * Removing unnecessary code/variables * enforce RGB24 as a reading format (will crash before ffmpeg fix) * permute the dimensions to return (RGB x H x W) * Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list * Adjusting tests for the new return type * remove unnecessary jitter * clangangangang * Metadata return changes (#1) * remove implicit calls to set a current stream (#2) * Adding new tests to check the accuracy of the seek * cleanup debugging statements * Addressing PR comments * addressing Francisco's comments * CLANG build formatting * Updated testing to test against pyav for the video tensor reads * Formatting * remove pyav from pip deps and add it to conda build * add pyav and ffmeped to conda builds * Formatting? * Setting up linter once and for all hopefully * Testing pyav * Fix to 8.0.0 * Try 6.2.0 * See what happens with av from pip * Remove FFMPEG blocker * What is going on? * More tests * Forgot something * unblocker * Check if cache is messing up with things * Now try with different ffmpeg * Now try with different ffmpeg * Testing pyav * Fix to 8.0.0 * Try 6.2.0 * See what happens with av from pip * What is going on? * More tests * Forgot something * Check if cache is messing up with things * Now try with different ffmpeg * Now try with different ffmpeg * Do not install av * Test with ffmpeg 4.2 * clean up video tests * cleaning up the tests a bit to better test partial reading * arrgh linter * Forgot the av test * forgot av test * checkout build files from master * revert circleci * addressing Franciscos comments * addressing Franciscos comments * Ignore ffmpeg in travis Co-authored-by: Francisco Massa <fvsmassa@gmail.com> Co-authored-by: Edgar Andrés Margffoy Tuay <andfoy@gmail.com>

[WIP] Add initial prototype

fe17eb8

fmassa commented Jun 28, 2019

View reviewed changes

fmassa added 5 commits July 1, 2019 09:37

bugfix

0417a69

Start adding some tests

72d8c0b

Add more tests

925a353

Misc improvements

f7fc7c5

More improvements

593f59d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Video dataset functionalities #1

[WIP] Video dataset functionalities #1

fmassa commented Jun 28, 2019

fmassa Jun 28, 2019

fmassa Jun 28, 2019

		from torchvision.io import read_video_timestamps, read_video


		def unfold(tensor, size, step, dilation):

[WIP] Video dataset functionalities #1

Are you sure you want to change the base?

[WIP] Video dataset functionalities #1

Conversation

fmassa commented Jun 28, 2019

fmassa Jun 28, 2019

Choose a reason for hiding this comment

fmassa Jun 28, 2019

Choose a reason for hiding this comment