Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify channel dim for transforms.Normalize #6816

Open
x4Cx58x54 opened this issue Oct 23, 2022 · 2 comments
Open

Specify channel dim for transforms.Normalize #6816

x4Cx58x54 opened this issue Oct 23, 2022 · 2 comments

Comments

@x4Cx58x54
Copy link

x4Cx58x54 commented Oct 23, 2022

🚀 The feature

Specify channel dim for transforms.Normalize, transforms.functional.normalize, transforms.functional_tensor.normalize, To enable transforms.Normalize to normalize according mean and std by specified channel.

A solution is adding a new argument dim_channel to the classes and functions above and

# in transforms.functional_tensor.normalize
broadcast_ch_shape = [1 for _ in range(tensor.ndim)]
broadcast_ch_shape[dim_channel] = -1
if mean.ndim == 1:
    mean = mean.view(*broadcast_ch_shape)
if std.ndim == 1:
    std = std.view(*broadcast_ch_shape)
return tensor.sub_(mean).div_(std)

Motivation, pitch

Recent torchvision deprecated transforms._transforms_video and added features in many transforms to process [..., H, W] shaped tensors. For video transforming, it is a great improvement, meanwhile, transforms.Normalize is not lucky enough to be among these transforms. This means that the users either resort to other transforms such as pytorchvideo.transforms.Normalize or normalize each frame seperately. The requested feature will relieve this pain, and video transforms can be more nice and neat.

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

@datumbox
Copy link
Contributor

@x4Cx58x54 Thanks a lot for the proposal.

We need a bit more time to decide how we want to handle this. Right now we are in the middle of revamping the Transforms API to offer native support not only for Images but also Videos, Bounding Boxes, Masks, Labels etc. We plan to post soon a blogpost with the announcement but you can see some examples at #6753.

To make the long story short, the new Transforms API "stores" the videos in a [..., T, C, H, W] format. This allows us to very efficiently transform the video frames by reusing existing image kernels. We also offer transforms to permute/transpose the dimensions. The new API uses Tensor Subclassing to store meta-data along the standard tensor (things like colour space for example).

Offering an extra parameter on normalize kernel is possible but conflicts with the existing design. Having said that, in some limited cases, we've offered this new parameter to assist user migration. For example:

def uniform_temporal_subsample_video(video: torch.Tensor, num_samples: int, temporal_dim: int = -4) -> torch.Tensor:

Given the above, shall we wait for the blogpost to be published (happy to give you a ping) and give you some time to review the design? After that, it would be great to get your input on whether the new API covers your needs or if you think we need enhancements. Let me know what you think. Thanks!

@x4Cx58x54
Copy link
Author

@datumbox Thanks for your reply. I would be greatly obliged if you give me a ping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants