[Docs] Translate data transform tutorial and migration docs.

open-mmlab · Dec 7, 2022 · 219b3b4 · 219b3b4
1 parent bced7d6
commit 219b3b4
Show file tree

Hide file tree

Showing 6 changed files with 339 additions and 29 deletions.
diff --git a/docs/en/advanced_tutorials/data_transform.md b/docs/en/advanced_tutorials/data_transform.md
@@ -1,3 +1,156 @@
 # Data transform
 
-Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/data_transform.html).
+In the OpenMMLab repositories, dataset construction and data preparation are decoupled from each other.
+Usually, the dataset construction only parses the dataset and records the basic information of each sample,
+while the data preparation is performed by a series of data transforms, such as data loading, preprocessing,
+and formatting based on the basic information of the samples.
+
+## To use Data Transforms
+
+In MMEngine, we use various callable data transforms classes to perform data manipulation. These data
+transformation classes can accept several configuration parameters for instantiation and then process the
+input data dictionary by calling. Also, all data transforms accept a dictionary as input and output the
+processed data as a dictionary. A simple example is as belows:
+
+```{note}
+In MMEngine, we don't have the implementations of data transforms. you can find the base data transform class
+and many other data transforms in MMCV. So you need to install MMCV before learning this tutorial, see the
+{external+mmcv:doc}`MMCV installation guild <get_started/installation>`.
+```
+
+```python
+>>> import numpy as np
+>>> from mmcv.transforms import Resize
+>>>
+>>> transform = Resize(scale=(224, 224))
+>>> data_dict = {'img': np.random.rand(256, 256, 3)}
+>>> data_dict = transform(data_dict)
+>>> print(data_dict['img'].shape)
+(224, 224, 3)
+```
+
+## To use in Config Files
+
+In config files, we can compose multiple data transforms as a list, called a data pipeline. And the data
+pipeline is an argument of the dataset.
+
+Usually, a data pipeline consists of the following parts:
+
+1. Data loading, use [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) to load image files.
+2. Label loading, use [`LoadAnnotations`](mmcv.transforms.LoadAnnotations) to load the bboxes, semantic segmentation and keypoint annotations.
+3. Data processing and augmentation, like [`RandomResize`](mmcv.transforms.RandomResize).
+4. Data formatting, we use different data transforms for different tasks. And the data transform for specified
+   task is implemented in the corresponding repository. For example, the data formatting transform for image
+   classification task is `PackClsInputs` and it's in MMClassification.
+
+Here, taking the classification task as an example, we show a typical data pipeline in the figure below. For
+each sample, the basic information stored in the dataset is a dictionary as shown on the far left side of the
+figure, after which, every blue block represents a data transform, and in every data transform, we add some new fields (marked in green) or update some existing fields (marked in orange) in the data dictionary.
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/26739999/187157681-ac4dcac8-3543-4bfe-ab30-9aa9e56d4900.jpg" width="90%"/>
+</div>
+
+If want to use the above data pipeline in our config file, use the below settings:
+
+```python
+test_dataloader = dict(
+    batch_size=32,
+    dataset=dict(
+        type='ImageNet',
+        data_root='data/imagenet',
+        pipeline = [
+            dict(type='LoadImageFromFile'),
+            dict(type='Resize', size=256, keep_ratio=True),
+            dict(type='CenterCrop', crop_size=224),
+            dict(type='PackClsInputs'),
+        ]
+    )
+)
+```
+
+## Common Data Transforms
+
+According to the functionality, the data transform classes can be divided into data loading, data
+pre-processing & augmentation and data formatting.
+
+### Data Loading
+
+To support loading large-scale dataset, usually we won't load all dense data during dataset construction, but
+only load the file path of these data. Therefore, we need to load these data in the data pipeline.
+
+|                     Data Transforms                      |                                     Functionality                                     |
+| :------------------------------------------------------: | :-----------------------------------------------------------------------------------: |
+| [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) |                          Load images according to the path.                           |
+|  [`LoadAnnotations`](mmcv.transforms.LoadImageFromFile)  | Load and format annotations information, including bbox, segmentation map and others. |
+
+### Data Pre-processing & Augmentation
+
+Data transforms for pre-processing and augmentation usually manipulate the image and annotation data, like
+cropping, padding, resizing and others.
+
+|                      Data Transforms                       |                         Functionality                          |
+| :--------------------------------------------------------: | :------------------------------------------------------------: |
+|                [`Pad`](mmcv.transforms.Pad)                |                   Pad the margin of images.                    |
+|         [`CenterCrop`](mmcv.transforms.CenterCrop)         |            Crop the image and keep the center part.            |
+|          [`Normalize`](mmcv.transforms.Normalize)          |                  Normalize the image pixels.                   |
+|             [`Resize`](mmcv.transforms.Resize)             |         Resize images to the specified scale or ratio.         |
+|       [`RandomResize`](mmcv.transforms.RandomResize)       |    Resize images to a random scale in the specified range.     |
+| [`RandomChoiceResize`](mmcv.transforms.RandomChoiceResize) | Resize images to a random scale from several specified scales. |
+|    [`RandomGrayscale`](mmcv.transforms.RandomGrayscale)    |                   Randomly grayscale images.                   |
+|         [`RandomFlip`](mmcv.transforms.RandomFlip)         |                     Randomly flip images.                      |
+
+### Data Formatting
+
+Data formatting transforms will convert the data to some specified type.
+
+|                 Data Transforms                  |                     Functionality                     |
+| :----------------------------------------------: | :---------------------------------------------------: |
+|      [`ToTensor`](mmcv.transforms.ToTensor)      | Convert the data of specified field to `torch.Tensor` |
+| [`ImageToTensor`](mmcv.transforms.ImageToTensor) |  Convert images to `torch.Tensor` in PyTorch format.  |
+
+## Custom Data Transform Classes
+
+To implement a new data transform class, the class needs to inherit `BaseTransform` and implement `transform`
+method. Here, we use a simple flip transforms (`MyFlip`) as example:
+
+```python
+import random
+import mmcv
+from mmcv.transforms import BaseTransform, TRANSFORMS
+
+@TRANSFORMS.register_module()
+class MyFlip(BaseTransform):
+    def __init__(self, direction: str):
+        super().__init__()
+        self.direction = direction
+
+    def transform(self, results: dict) -> dict:
+        img = results['img']
+        results['img'] = mmcv.imflip(img, direction=self.direction)
+        return results
+```
+
+Then, we can instantiate a `MyFlip` object and use it to process our data dictionary.
+
+```python
+import numpy as np
+
+transform = MyFlip(direction='horizontal')
+data_dict = {'img': np.random.rand(224, 224, 3)}
+data_dict = transform(data_dict)
+processed_img = data_dict['img']
+```
+
+Or, use it in the data pipeline by modifying our config file:
+
+```python
+pipeline = [
+    ...
+    dict(type='MyFlip', direction='horizontal'),
+    ...
+]
+```
+
+Please note that to use the class in our config file, we need to confirm the `MyFlip` class will be imported
+during running.
diff --git a/docs/en/conf.py b/docs/en/conf.py
@@ -45,7 +45,6 @@
     'sphinx.ext.napoleon',
     'sphinx.ext.viewcode',
     'sphinx.ext.autosectionlabel',
-    'sphinx_markdown_tables',
     'myst_parser',
     'sphinx_copybutton',
     'sphinx.ext.autodoc.typehints',
@@ -58,7 +57,7 @@
     'python': ('https://docs.python.org/3', None),
     'numpy': ('https://numpy.org/doc/stable', None),
     'torch': ('https://pytorch.org/docs/stable/', None),
-    'mmcv': ('https://mmcv.readthedocs.io/en/dev-2.x/', None),
+    'mmcv': ('https://mmcv.readthedocs.io/en/2.x/', None),
 }
 
 # Add any paths that contain templates here, relative to this directory.

diff --git a/docs/en/migration/transform.md b/docs/en/migration/transform.md
@@ -1,3 +1,162 @@
 # Migrate Data Transform to OpenMMLab 2.0
 
-Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/migration/transform.html).
+## Introduction
+
+According to the data transform interface convention of TorchVision, all data transform classes need to
+implement the `__call__` method. And in the convention of OpenMMLab 1.0, we require the input and output of
+the `__call__` method should be a dictionary.
+
+In OpenMMLab 2.0, to make the data transform classes more extensible, we use `transform` method instead of
+`__call__` method to implement data transformation, and all data transform classes should inherit the
+[`mmcv.transforms.BaseTransfrom`](mmcv.transforms.BaseTransfrom) class. And you can still use these data
+transform classes by calling.
+
+A tutorial to implement a data transform class can be found in the [Data Transform](../advanced_tutorials/data_element.md).
+
+In addition, we move some common data transform classes from every repositories to MMCV, and in this document,
+we will compare the functionalities, usages and implementations between the original data transform classes (in [MMClassification v0.23.2](https://github.com/open-mmlab/mmclassification/tree/v0.23.2), [MMDetection v2.25.1](https://github.com/open-mmlab/mmdetection/tree/v2.25.1)) and the new data transform classes (in [MMCV v2.0.0rc1](https://github.com/open-mmlab/mmcv/tree/dev-2.x))
+
+## Functionality Differences
+
+<table class="colwidths-auto docutils align-default">
+<thead>
+  <tr>
+    <th></th>
+    <th>MMClassification (original)</th>
+    <th>MMDetection (original)</th>
+    <th>MMCV (new)</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td><code>LoadImageFromFile</code></td>
+    <td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading.</td>
+    <td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support
+    specifying the order of channels.</td>
+    <td>Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.</td>
+  </tr>
+  <tr>
+    <td><code>LoadAnnotations</code></td>
+    <td>Not available.</td>
+    <td>Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system.</td>
+    <td>Load bbox, label, mask (not include polygon masks), semantic segmentation.</td>
+  </tr>
+  <tr>
+    <td><code>Pad</code></td>
+    <td>Pad all images in the "img_fields" field.</td>
+    <td>Pad all images in the "img_fields" field. Support padding to integer multiple size.</td>
+    <td>Pad the image in the "img" field. Support padding to integer multiple size.</td>
+  </tr>
+  <tr>
+    <td><code>CenterCrop</code></td>
+    <td>Crop all images in the "img_fields" field. Support cropping as EfficientNet style.</td>
+    <td>Not available.</td>
+    <td>Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.</td>
+  </tr>
+  <tr>
+    <td><code>Normalize</code></td>
+    <td>Normalize the image.</td>
+    <td>No differences.</td>
+    <td>No differences, but we recommend to use <a href="../tutorials/model.html#datapreprocessor">data preprocessor</a> to normalize the image.</td>
+  </tr>
+  <tr>
+    <td><code>Resize</code></td>
+    <td>Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge.</td>
+    <td>Use <code>Resize</code> with <code>ratio_range=None</code>, the <code>img_scale</code> have a single scale, and <code>multiscale_mode="value"</code>.</td>
+    <td>Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.</td>
+  </tr>
+  <tr>
+    <td><code>RandomResize</code></td>
+    <td>Not available</td>
+    <td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have two scales and <code>multiscale_mode="range"</code>, or <code>ratio_range</code> is not None.
+    <pre>Resize(
+    img_sacle=[(640, 480), (960, 720)],
+    mode="range",
+)</pre>
+    </td>
+    <td>Have the same resize function as <code>Resize</code>. Support sampling the scale from a scale range or scale ratio range.
+    <pre>RandomResize(scale=[(640, 480), (960, 720)])</pre>
+    </td>
+  </tr>
+  <tr>
+    <td><code>RandomChoiceResize</code></td>
+    <td>Not available</td>
+    <td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have multiple scales, and <code>multiscale_mode="value"</code>.
+    <pre>Resize(
+    img_sacle=[(640, 480), (960, 720)],
+    mode="value",
+)</pre>
+    </td>
+    <td>Have the same resize function as <code>Resize</code>. Support randomly choosing the scale from multiple scales or multiple scale ratios.
+    <pre>RandomChoiceResize(scales=[(640, 480), (960, 720)])</pre>
+    </td>
+  </tr>
+  <tr>
+    <td><code>RandomGrayscale</code></td>
+    <td>Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale.</td>
+    <td>Not available</td>
+    <td>Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.</td>
+  </tr>
+  <tr>
+    <td><code>RandomFlip</code></td>
+    <td>Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically.</td>
+    <td>Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
+    <td>Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
+  </tr>
+  <tr>
+    <td><code>MultiScaleFlipAug</code></td>
+    <td>Not available</td>
+    <td>Used for test-time-augmentation.</td>
+    <td>Use <code><a href="https://mmcv.readthedocs.io/en/2.x/api/generated/mmcv.transforms.TestTimeAug.html">TestTimeAug</a></code></td>
+  </tr>
+  <tr>
+    <td><code>ToTensor</code></td>
+    <td>Convert the values in the specified fields to <code>torch.Tensor</code>.</td>
+    <td>No differences</td>
+    <td>No differences</td>
+  </tr>
+  <tr>
+    <td><code>ImageToTensor</code></td>
+    <td>Convert the values in the specified fields to <code>torch.Tensor</code> and transpose the channels to CHW.</td>
+    <td>No differences.</td>
+    <td>No differences.</td>
+  </tr>
+</tbody>
+</table>
+
+## Implementation Differences
+
+Take `RandomFlip` as example, the new version [RandomFlip](<>) in MMCV inherits `BaseTransfrom`, and move the
+functionality implementation from `__call__` to `transform` method. In addition, the randomness related code
+is placed in some extra methods and these methods need to be wrapped by `cache_randomness` decorator.
+
+- MMDetection (original version)
+
+```python
+class RandomFlip:
+    def __call__(self, results):
+        """Randomly flip images."""
+        ...
+        # Randomly choose the flip direction
+        cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
+        ...
+        return results
+```
+
+- MMCV (new version)
+
+```python
+class RandomFlip(BaseTransfrom):
+    def transform(self, results):
+        """Randomly flip images"""
+        ...
+        cur_dir = self._random_direction()
+        ...
+        return results
+
+    @cache_randomness
+    def _random_direction(self):
+        """Randomly choose the flip direction"""
+        ...
+        return np.random.choice(direction_list, p=flip_ratio_list)
+```