Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] UCF101 prototype with utilities for video loading #4838

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented Nov 2, 2021

A simple pyav based set of utilities with a POC implementation for UCF101 dataset

cc @pmeier @bjuncek

@facebook-github-bot
Copy link

facebook-github-bot commented Nov 2, 2021

💊 CI failures summary and remediations

As of commit f1a69e0 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @bjuncek. I have some comments inline about the general infrastructure. I can't really comment on the validity of the video utility datapipes that you added, because I have to little experience with videos. I'll leave that up to other reviewers.

main.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/_builtin/ucf101.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/_builtin/ucf101.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/_builtin/ucf101.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/_builtin/ucf101.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/decoder.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/decoder.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/video_utils.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/video_utils.py Outdated Show resolved Hide resolved
torchvision/prototype/datasets/video_utils.py Outdated Show resolved Hide resolved
@pmeier pmeier requested a review from fmassa November 3, 2021 07:31
@bjuncek
Copy link
Contributor Author

bjuncek commented Dec 1, 2021

Ok, so I've tried doing a pass on this, trying to fix the decoder inconsistency we've been talking about offline.

I don't understand datapipes well enough to understand why pop from a dict would fail or why I'd need to annotate variables in a datapipe. Everything since 8f57ee6 has nothing to do with the functionality of the UCF101 dataset and can be reverted if you want to do a clean pass and then merge into this.

Comment on lines +115 to +178
def __iter__(self) -> Iterator[Dict[str, Any]]:
for video_d in self.datapipe:
buffer = video_d["file"]
with av.open(buffer, metadata_errors="ignore") as container:
stream = container.streams.video[0]
time_base = stream.time_base

# duration is given in time_base units as int
duration = stream.duration

# get video_stream timestramps
# with a tolerance for pyav imprecission
_ptss = torch.arange(duration - 7)
_ptss = self._unfold(_ptss)
# shuffle the clips
perm = torch.randperm(_ptss.size(0))
idx = perm[: self.num_clips_per_video]
samples = _ptss[idx]

for clip_pts in samples:
start_pts = clip_pts[0].item()
end_pts = clip_pts[-1].item()
# video_timebase is the default time_base
pts_unit = "pts"
start_pts, end_pts, pts_unit = _video_opt._convert_to_sec(start_pts, end_pts, "pts", time_base)
video_frames = video._read_from_stream(
container,
float(start_pts),
float(end_pts),
pts_unit,
stream,
{"video": 0},
)

vframes_list = [frame.to_ndarray(format="rgb24") for frame in video_frames]

if vframes_list:
vframes = torch.as_tensor(np.stack(vframes_list))
# account for rounding errors in conversion
# FIXME: fix this in the code
vframes = vframes[: self.num_frames_per_clip, ...]

else:
vframes = torch.empty((0, 1, 1, 3), dtype=torch.uint8)
print("FAIL")

# [N,H,W,C] to [N,C,H,W]
vframes = vframes.permute(0, 3, 1, 2)
assert vframes.size(0) == self.num_frames_per_clip

# TODO: support sampling rates (FPS change)
# TODO: optimization (read all and select)

yield {
"clip": vframes,
"pts": clip_pts,
"range": (start_pts, end_pts),
"video_meta": {
"time_base": float(stream.time_base),
"guessed_fps": float(stream.guessed_rate),
},
"path": video_d["path"],
"target": video_d["target"],
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do the following:

  • sample m start positions
  • for every start position, read k frames
  • yield the k frames at once, m times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, this is exactly what I do:

  1. sample starting positions (line 132)
  2. for every start position (line 134) read k frames (line 140)
  3. yield the frames as a sample (line 168)

Are you suggesting to take the yield outside of the loop? If so, is there any benefit to this?

import numpy as np
import torch
from torchdata.datapipes.iter import IterDataPipe
from torchvision.io import video, _video_opt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I would use _video_opt in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
Any particular reason why not?

@pmeier
Copy link
Collaborator

pmeier commented Dec 16, 2021

@bjuncek I've cleaned up the PR with recent changes and also removed all the decoder changes. We will handle this separate from this PR following #5075.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants