Open
Description
Hi,
I was looking through the Transforms API docs and noticed an inconsistency with video input shapes across different transforms. Here are a few examples to illustrate my confusion:
AugMix
takes an input video tensor of shape(T, C, H, W)
CutMix
takes in a batch of videos of shape(B, C, T, H, W)
Div255
takes in an input video tensor of shape(C, T, H, W)
Is there any reason as to why the 'channels' and 'temporal' dimensions are sometimes transposed? I tried looking for an answer to this question but couldn't find anything, so I hope this is the right place.
Thanks! 🙂
Metadata
Metadata
Assignees
Labels
No labels