Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Video dataset functionalities #1

Open
wants to merge 6 commits into
base: video-reader
Choose a base branch
from
Open

Conversation

fmassa
Copy link
Owner

@fmassa fmassa commented Jun 28, 2019

This is a WIP prototype for now.

It's here to give some more context of how the functionality in pytorch#1039 will be used for reading video data.

It still requires documentation and tests.

cc @bjuncek @stephenyan1231 for an initial review

def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
label = self.samples[video_idx][1]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add transforms yet

from torchvision.io import read_video_timestamps, read_video


def unfold(tensor, size, step, dilation):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses stride tricks to compute all the possible clips in the video, with potential steps between clips and dilation (steps between frames)

fmassa added a commit that referenced this pull request Oct 8, 2020
* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (#1)

* remove implicit calls to set a current stream (#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (#1)

* remove implicit calls to set a current stream (#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* Addressing PR comments

* addressing Francisco's comments

* CLANG build formatting

* Updated testing to test against pyav for the video tensor reads

* Formatting

* remove pyav from pip deps and add it to conda build

* add pyav and ffmeped to conda builds

* Formatting?

* Setting up linter once and for all hopefully

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* Remove FFMPEG blocker

* What is going on?

* More tests

* Forgot something

* unblocker

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* What is going on?

* More tests

* Forgot something

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Do not install av

* Test with ffmpeg 4.2

* clean up video tests

* cleaning up the tests a bit to better test partial reading

* arrgh linter

* Forgot the av test

* forgot av test

* checkout build files from master

* revert circleci

* addressing Franciscos comments

* addressing Franciscos comments

* Ignore ffmpeg in travis

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Co-authored-by: Edgar Andrés Margffoy Tuay <andfoy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant