Strange RGB / BGR settings in ssv2 & kinetics data loader #14

kami93 · 2022-02-22T09:05:56Z

Hi. Thanks for the nice work.

I have some questions regarding RGB / BGR standards used by ssv2 and kinetics loaders in this repo. Directly stating, I think RGB / BGR standards are mishandled in the current codebase.

Specifically, SSv2 data loader initially reads frames in BGR standard (using OpenCV), however, the data-loader sometimes incorrectly applies functions that assume RGB input standards (e.g., ToPILImage). The final output is in BGR which is compatible with the pretrained ViT-B that assumes BGR standard.

On the other hand, kinetics data loader initially reads frames in RGB standard (using PyAV), however, the data-loader sometimes incorrectly applies functions that assume BGR input standards (e.g., color_jitter). The final output is in RGB which is incompatible with the pretrained ViT-B that assumes BGR standard.

I will try to point out problems in the order that the data loaders actually processes input files.

1. Kinetics loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py)

1-1. Kinetics loader reads mp4 videos with PyAV backend, using VideoFrame.to_rgb method

reference:
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py#L236-L246
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/decoder.py#L269-L280

VideoFrame.to_rgb reads mp4 frames in RGB standard

1-2. frames_augmentation is applied, which assumes BGR standards.

Specifically, contrast jitter relies on "BGR to Grayscale" transform, which is sensitive to the channel order. As a result, the augmentation is being incorrectly applied.

1-3. The final output is RGB.

Since ViT-B assumes BGR standards, the performance can be potentially sub-optimal (though we will finetune with video datasets)

2. SSv2 loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py)

I will try to point out problems in the order that the SSv2 actually processes input files.

2-1. SSv2 loader reads jpeg frames using cv2.imdecode method.

reference:
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py#L246-L251
https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/utils.py#L41-L52

cv2.imdecode reads jpeg files in BGR standard

2-2. Frames are converted to PIL images using torchvision.transforms.ToPILImage method.

As stated in the torchvision's documentation, torchvision.transforms.ToPILImage expects RGB standard, and the currently wrong channel order would lead to potentially incorrect color augmentations. Fortunately, I guess the current RandAug profile does not include channel-order sensitive augmentations.

2-3. The final output is BGR

ViT-B also follows BGR standards, hence there is no problem outputting BGR standard frames.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange RGB / BGR settings in ssv2 & kinetics data loader #14

Strange RGB / BGR settings in ssv2 & kinetics data loader #14

kami93 commented Feb 22, 2022 •

edited

Loading

Strange RGB / BGR settings in ssv2 & kinetics data loader #14

Strange RGB / BGR settings in ssv2 & kinetics data loader #14

Comments

kami93 commented Feb 22, 2022 • edited Loading

1. Kinetics loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py)

1-1. Kinetics loader reads mp4 videos with PyAV backend, using VideoFrame.to_rgb method

1-2. frames_augmentation is applied, which assumes BGR standards.

1-3. The final output is RGB.

2. SSv2 loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py)

2-1. SSv2 loader reads jpeg frames using cv2.imdecode method.

2-2. Frames are converted to PIL images using torchvision.transforms.ToPILImage method.

2-3. The final output is BGR

kami93 commented Feb 22, 2022 •

edited

Loading