[WIP] UCF101 prototype with utilities for video loading#4838
[WIP] UCF101 prototype with utilities for video loading#4838bjuncek wants to merge 52 commits intopytorch:mainfrom
Conversation
💊 CI failures summary and remediationsAs of commit f1a69e0 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
pmeier
left a comment
There was a problem hiding this comment.
Thanks a lot @bjuncek. I have some comments inline about the general infrastructure. I can't really comment on the validity of the video utility datapipes that you added, because I have to little experience with videos. I'll leave that up to other reviewers.
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
…k/vision into bkorbar/prototypes/ucf101
|
Ok, so I've tried doing a pass on this, trying to fix the decoder inconsistency we've been talking about offline. I don't understand datapipes well enough to understand why pop from a dict would fail or why I'd need to annotate variables in a datapipe. Everything since |
| def __iter__(self) -> Iterator[Dict[str, Any]]: | ||
| for video_d in self.datapipe: | ||
| buffer = video_d["file"] | ||
| with av.open(buffer, metadata_errors="ignore") as container: | ||
| stream = container.streams.video[0] | ||
| time_base = stream.time_base | ||
|
|
||
| # duration is given in time_base units as int | ||
| duration = stream.duration | ||
|
|
||
| # get video_stream timestramps | ||
| # with a tolerance for pyav imprecission | ||
| _ptss = torch.arange(duration - 7) | ||
| _ptss = self._unfold(_ptss) | ||
| # shuffle the clips | ||
| perm = torch.randperm(_ptss.size(0)) | ||
| idx = perm[: self.num_clips_per_video] | ||
| samples = _ptss[idx] | ||
|
|
||
| for clip_pts in samples: | ||
| start_pts = clip_pts[0].item() | ||
| end_pts = clip_pts[-1].item() | ||
| # video_timebase is the default time_base | ||
| pts_unit = "pts" | ||
| start_pts, end_pts, pts_unit = _video_opt._convert_to_sec(start_pts, end_pts, "pts", time_base) | ||
| video_frames = video._read_from_stream( | ||
| container, | ||
| float(start_pts), | ||
| float(end_pts), | ||
| pts_unit, | ||
| stream, | ||
| {"video": 0}, | ||
| ) | ||
|
|
||
| vframes_list = [frame.to_ndarray(format="rgb24") for frame in video_frames] | ||
|
|
||
| if vframes_list: | ||
| vframes = torch.as_tensor(np.stack(vframes_list)) | ||
| # account for rounding errors in conversion | ||
| # FIXME: fix this in the code | ||
| vframes = vframes[: self.num_frames_per_clip, ...] | ||
|
|
||
| else: | ||
| vframes = torch.empty((0, 1, 1, 3), dtype=torch.uint8) | ||
| print("FAIL") | ||
|
|
||
| # [N,H,W,C] to [N,C,H,W] | ||
| vframes = vframes.permute(0, 3, 1, 2) | ||
| assert vframes.size(0) == self.num_frames_per_clip | ||
|
|
||
| # TODO: support sampling rates (FPS change) | ||
| # TODO: optimization (read all and select) | ||
|
|
||
| yield { | ||
| "clip": vframes, | ||
| "pts": clip_pts, | ||
| "range": (start_pts, end_pts), | ||
| "video_meta": { | ||
| "time_base": float(stream.time_base), | ||
| "guessed_fps": float(stream.guessed_rate), | ||
| }, | ||
| "path": video_d["path"], | ||
| "target": video_d["target"], | ||
| } |
There was a problem hiding this comment.
Why not just do the following:
- sample
mstart positions - for every start position, read
kframes - yield the
kframes at once,mtimes
There was a problem hiding this comment.
Unless I'm missing something, this is exactly what I do:
- sample starting positions (line 132)
- for every start position (line 134) read
kframes (line 140) - yield the frames as a sample (line 168)
Are you suggesting to take the yield outside of the loop? If so, is there any benefit to this?
| import numpy as np | ||
| import torch | ||
| from torchdata.datapipes.iter import IterDataPipe | ||
| from torchvision.io import video, _video_opt |
There was a problem hiding this comment.
I'm not sure if I would use _video_opt in here.
There was a problem hiding this comment.
Sure.
Any particular reason why not?
A simple
pyavbased set of utilities with a POC implementation for UCF101 datasetcc @pmeier @bjuncek