You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
I have some questions regarding RGB / BGR standards used by ssv2 and kinetics loaders in this repo. Directly stating, I think RGB / BGR standards are mishandled in the current codebase.
Specifically, SSv2 data loader initially reads frames in BGR standard (using OpenCV), however, the data-loader sometimes incorrectly applies functions that assume RGB input standards (e.g., ToPILImage). The final output is in BGR which is compatible with the pretrained ViT-B that assumes BGR standard.
On the other hand, kinetics data loader initially reads frames in RGB standard (using PyAV), however, the data-loader sometimes incorrectly applies functions that assume BGR input standards (e.g., color_jitter). The final output is in RGB which is incompatible with the pretrained ViT-B that assumes BGR standard.
I will try to point out problems in the order that the data loaders actually processes input files.
Specifically, contrast jitter relies on "BGR to Grayscale" transform, which is sensitive to the channel order. As a result, the augmentation is being incorrectly applied.
1-3. The final output is RGB.
Since ViT-B assumes BGR standards, the performance can be potentially sub-optimal (though we will finetune with video datasets)
As stated in the torchvision's documentation, torchvision.transforms.ToPILImage expects RGB standard, and the currently wrong channel order would lead to potentially incorrect color augmentations. Fortunately, I guess the current RandAug profile does not include channel-order sensitive augmentations.
2-3. The final output is BGR
ViT-B also follows BGR standards, hence there is no problem outputting BGR standard frames.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi. Thanks for the nice work.
I have some questions regarding RGB / BGR standards used by ssv2 and kinetics loaders in this repo. Directly stating, I think RGB / BGR standards are mishandled in the current codebase.
Specifically, SSv2 data loader initially reads frames in BGR standard (using OpenCV), however, the data-loader sometimes incorrectly applies functions that assume RGB input standards (e.g., ToPILImage). The final output is in BGR which is compatible with the pretrained ViT-B that assumes BGR standard.
On the other hand, kinetics data loader initially reads frames in RGB standard (using PyAV), however, the data-loader sometimes incorrectly applies functions that assume BGR input standards (e.g., color_jitter). The final output is in RGB which is incompatible with the pretrained ViT-B that assumes BGR standard.
I will try to point out problems in the order that the data loaders actually processes input files.
1. Kinetics loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py)
1-1. Kinetics loader reads mp4 videos with PyAV backend, using VideoFrame.to_rgb method
reference:
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py#L236-L246
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/decoder.py#L269-L280
VideoFrame.to_rgb reads mp4 frames in RGB standard
1-2. frames_augmentation is applied, which assumes BGR standards.
Specifically, contrast jitter relies on "BGR to Grayscale" transform, which is sensitive to the channel order. As a result, the augmentation is being incorrectly applied.
1-3. The final output is RGB.
Since ViT-B assumes BGR standards, the performance can be potentially sub-optimal (though we will finetune with video datasets)
2. SSv2 loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py)
I will try to point out problems in the order that the SSv2 actually processes input files.
2-1. SSv2 loader reads jpeg frames using cv2.imdecode method.
reference:
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py#L246-L251
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/utils.py#L41-L52
cv2.imdecode reads jpeg files in BGR standard
2-2. Frames are converted to PIL images using torchvision.transforms.ToPILImage method.
As stated in the torchvision's documentation, torchvision.transforms.ToPILImage expects RGB standard, and the currently wrong channel order would lead to potentially incorrect color augmentations. Fortunately, I guess the current RandAug profile does not include channel-order sensitive augmentations.
2-3. The final output is BGR
ViT-B also follows BGR standards, hence there is no problem outputting BGR standard frames.
The text was updated successfully, but these errors were encountered: