Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 33 additions & 7 deletions docs/source/transforms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,34 @@ torchvision.transforms
.. currentmodule:: torchvision.transforms

Transforms are common image transformations. They can be chained together using :class:`Compose`.
Additionally, there is the :mod:`torchvision.transforms.functional` module.
Functional transforms give fine-grained control over the transformations.
Most transform classes have a function equivalent: :ref:`functional
transforms <functional_transforms>` give fine-grained control over the
transformations.
This is useful if you have to build a more complex transformation pipeline
(e.g. in the case of segmentation tasks).

All transformations accept PIL Image, Tensor Image or batch of Tensor Images as input. Tensor Image is a tensor with
``(C, H, W)`` shape, where ``C`` is a number of channels, ``H`` and ``W`` are image height and width. Batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number of images in the batch. Deterministic or
random transformations applied on the batch of Tensor Images identically transform all the images of the batch.
Most transformations accept both `PIL <https://pillow.readthedocs.io>`_
images and tensor images, although some transformations are :ref:`PIL-only
<transforms_pil_only>` and some are :ref:`tensor-only
<transforms_tensor_only>`. The :ref:`conversion_transforms` may be used to
convert to and from PIL images.

The transformations that accept tensor images also accept batches of tensor
images. A Tensor Image is a tensor with ``(C, H, W)`` shape, where ``C`` is a
number of channels, ``H`` and ``W`` are image height and width. A batch of
Tensor Images is a tensor of ``(B, C, H, W)`` shape, where ``B`` is a number
of images in the batch.

The expected range of the values of a tensor image is implicitely defined by
the tensor dtype. Tensor images with a float dtype are expected to have
values in ``[0, 1)``. Tensor images with an integer dtype are expected to
have values in ``[0, MAX_DTYPE]`` where ``MAX_DTYPE`` is the largest value
that can be represented in that dtype.

Randomized transformations will apply the same transformation to all the
images of a given batch, but they will produce different transformations
across calls. For reproducible transformations across calls, you may use
:ref:`functional transforms <functional_transforms>`.

.. warning::

Expand Down Expand Up @@ -117,13 +136,16 @@ Transforms on PIL Image and torch.\*Tensor
.. autoclass:: GaussianBlur
:members:

.. _transforms_pil_only:

Transforms on PIL Image only
----------------------------

.. autoclass:: RandomChoice

.. autoclass:: RandomOrder

.. _transforms_tensor_only:

Transforms on torch.\*Tensor only
---------------------------------
Expand All @@ -139,6 +161,7 @@ Transforms on torch.\*Tensor only

.. autoclass:: ConvertImageDtype

.. _conversion_transforms:

Conversion Transforms
---------------------
Expand Down Expand Up @@ -173,13 +196,16 @@ The new transform can be used standalone or mixed-and-matched with existing tran
:members:


.. _functional_transforms:

Functional Transforms
---------------------

Functional transforms give you fine-grained control of the transformation pipeline.
As opposed to the transformations above, functional transforms don't contain a random number
generator for their parameters.
That means you have to specify/generate all parameters, but you can reuse the functional transform.
That means you have to specify/generate all parameters, but the functional transform will give you
reproducible results across calls.

Example:
you can apply a functional transform with the same parameters to multiple images like this:
Expand Down
15 changes: 8 additions & 7 deletions torchvision/transforms/functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,7 +671,7 @@ def five_crop(img: Tensor, size: List[int]) -> Tuple[Tensor, Tensor, Tensor, Ten

Returns:
tuple: tuple (tl, tr, bl, br, center)
Corresponding top left, top right, bottom left, bottom right and center crop.
Corresponding top left, top right, bottom left, bottom right and center crop.
"""
if isinstance(size, numbers.Number):
size = (int(size), int(size))
Expand Down Expand Up @@ -717,8 +717,8 @@ def ten_crop(img: Tensor, size: List[int], vertical_flip: bool = False) -> List[

Returns:
tuple: tuple (tl, tr, bl, br, center, tl_flip, tr_flip, bl_flip, br_flip, center_flip)
Corresponding top left, top right, bottom left, bottom right and
center crop and same for the flipped image.
Corresponding top left, top right, bottom left, bottom right and
center crop and same for the flipped image.
"""
if isinstance(size, numbers.Number):
size = (int(size), int(size))
Expand Down Expand Up @@ -1103,9 +1103,9 @@ def to_grayscale(img, num_output_channels=1):

Returns:
PIL Image: Grayscale version of the image.
if num_output_channels = 1 : returned image is single channel

if num_output_channels = 3 : returned image is 3 channel with r = g = b
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
"""
if isinstance(img, Image.Image):
return F_pil.to_grayscale(img, num_output_channels)
Expand All @@ -1128,9 +1128,9 @@ def rgb_to_grayscale(img: Tensor, num_output_channels: int = 1) -> Tensor:

Returns:
PIL Image or Tensor: Grayscale version of the image.
if num_output_channels = 1 : returned image is single channel

if num_output_channels = 3 : returned image is 3 channel with r = g = b
- if num_output_channels = 1 : returned image is single channel
- if num_output_channels = 3 : returned image is 3 channel with r = g = b
"""
if not isinstance(img, torch.Tensor):
return F_pil.to_grayscale(img, num_output_channels)
Expand Down Expand Up @@ -1330,6 +1330,7 @@ def equalize(img: Tensor) -> Tensor:
img (PIL Image or Tensor): Image on which equalize is applied.
If img is torch Tensor, it is expected to be in [..., 1 or 3, H, W] format,
where ... means it can have an arbitrary number of leading dimensions.
The tensor dtype must be ``torch.uint8`` and values are expected to be in ``[0, 255]``.
If img is PIL Image, it is expected to be in mode "P", "L" or "RGB".

Returns:
Expand Down
7 changes: 4 additions & 3 deletions torchvision/transforms/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -841,7 +841,7 @@ def get_params(

Returns:
tuple: params (i, j, h, w) to be passed to ``crop`` for a random
sized crop.
sized crop.
"""
width, height = F._get_image_size(img)
area = height * width
Expand Down Expand Up @@ -1464,8 +1464,9 @@ class Grayscale(torch.nn.Module):

Returns:
PIL Image: Grayscale version of the input.
- If ``num_output_channels == 1`` : returned image is single channel
- If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b

- If ``num_output_channels == 1`` : returned image is single channel
- If ``num_output_channels == 3`` : returned image is 3 channel with r == g == b

"""

Expand Down