-
Notifications
You must be signed in to change notification settings - Fork 352
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Docs] Translate data transform tutorial and migration docs.
- Loading branch information
Showing
6 changed files
with
339 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,156 @@ | ||
# Data transform | ||
|
||
Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/data_transform.html). | ||
In the OpenMMLab repositories, dataset construction and data preparation are decoupled from each other. | ||
Usually, the dataset construction only parses the dataset and records the basic information of each sample, | ||
while the data preparation is performed by a series of data transforms, such as data loading, preprocessing, | ||
and formatting based on the basic information of the samples. | ||
|
||
## To use Data Transforms | ||
|
||
In MMEngine, we use various callable data transforms classes to perform data manipulation. These data | ||
transformation classes can accept several configuration parameters for instantiation and then process the | ||
input data dictionary by calling. Also, all data transforms accept a dictionary as input and output the | ||
processed data as a dictionary. A simple example is as belows: | ||
|
||
```{note} | ||
In MMEngine, we don't have the implementations of data transforms. you can find the base data transform class | ||
and many other data transforms in MMCV. So you need to install MMCV before learning this tutorial, see the | ||
{external+mmcv:doc}`MMCV installation guild <get_started/installation>`. | ||
``` | ||
|
||
```python | ||
>>> import numpy as np | ||
>>> from mmcv.transforms import Resize | ||
>>> | ||
>>> transform = Resize(scale=(224, 224)) | ||
>>> data_dict = {'img': np.random.rand(256, 256, 3)} | ||
>>> data_dict = transform(data_dict) | ||
>>> print(data_dict['img'].shape) | ||
(224, 224, 3) | ||
``` | ||
|
||
## To use in Config Files | ||
|
||
In config files, we can compose multiple data transforms as a list, called a data pipeline. And the data | ||
pipeline is an argument of the dataset. | ||
|
||
Usually, a data pipeline consists of the following parts: | ||
|
||
1. Data loading, use [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) to load image files. | ||
2. Label loading, use [`LoadAnnotations`](mmcv.transforms.LoadAnnotations) to load the bboxes, semantic segmentation and keypoint annotations. | ||
3. Data processing and augmentation, like [`RandomResize`](mmcv.transforms.RandomResize). | ||
4. Data formatting, we use different data transforms for different tasks. And the data transform for specified | ||
task is implemented in the corresponding repository. For example, the data formatting transform for image | ||
classification task is `PackClsInputs` and it's in MMClassification. | ||
|
||
Here, taking the classification task as an example, we show a typical data pipeline in the figure below. For | ||
each sample, the basic information stored in the dataset is a dictionary as shown on the far left side of the | ||
figure, after which, every blue block represents a data transform, and in every data transform, we add some new fields (marked in green) or update some existing fields (marked in orange) in the data dictionary. | ||
|
||
<div align=center> | ||
<img src="https://user-images.githubusercontent.com/26739999/187157681-ac4dcac8-3543-4bfe-ab30-9aa9e56d4900.jpg" width="90%"/> | ||
</div> | ||
|
||
If want to use the above data pipeline in our config file, use the below settings: | ||
|
||
```python | ||
test_dataloader = dict( | ||
batch_size=32, | ||
dataset=dict( | ||
type='ImageNet', | ||
data_root='data/imagenet', | ||
pipeline = [ | ||
dict(type='LoadImageFromFile'), | ||
dict(type='Resize', size=256, keep_ratio=True), | ||
dict(type='CenterCrop', crop_size=224), | ||
dict(type='PackClsInputs'), | ||
] | ||
) | ||
) | ||
``` | ||
|
||
## Common Data Transforms | ||
|
||
According to the functionality, the data transform classes can be divided into data loading, data | ||
pre-processing & augmentation and data formatting. | ||
|
||
### Data Loading | ||
|
||
To support loading large-scale dataset, usually we won't load all dense data during dataset construction, but | ||
only load the file path of these data. Therefore, we need to load these data in the data pipeline. | ||
|
||
| Data Transforms | Functionality | | ||
| :------------------------------------------------------: | :-----------------------------------------------------------------------------------: | | ||
| [`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) | Load images according to the path. | | ||
| [`LoadAnnotations`](mmcv.transforms.LoadImageFromFile) | Load and format annotations information, including bbox, segmentation map and others. | | ||
|
||
### Data Pre-processing & Augmentation | ||
|
||
Data transforms for pre-processing and augmentation usually manipulate the image and annotation data, like | ||
cropping, padding, resizing and others. | ||
|
||
| Data Transforms | Functionality | | ||
| :--------------------------------------------------------: | :------------------------------------------------------------: | | ||
| [`Pad`](mmcv.transforms.Pad) | Pad the margin of images. | | ||
| [`CenterCrop`](mmcv.transforms.CenterCrop) | Crop the image and keep the center part. | | ||
| [`Normalize`](mmcv.transforms.Normalize) | Normalize the image pixels. | | ||
| [`Resize`](mmcv.transforms.Resize) | Resize images to the specified scale or ratio. | | ||
| [`RandomResize`](mmcv.transforms.RandomResize) | Resize images to a random scale in the specified range. | | ||
| [`RandomChoiceResize`](mmcv.transforms.RandomChoiceResize) | Resize images to a random scale from several specified scales. | | ||
| [`RandomGrayscale`](mmcv.transforms.RandomGrayscale) | Randomly grayscale images. | | ||
| [`RandomFlip`](mmcv.transforms.RandomFlip) | Randomly flip images. | | ||
|
||
### Data Formatting | ||
|
||
Data formatting transforms will convert the data to some specified type. | ||
|
||
| Data Transforms | Functionality | | ||
| :----------------------------------------------: | :---------------------------------------------------: | | ||
| [`ToTensor`](mmcv.transforms.ToTensor) | Convert the data of specified field to `torch.Tensor` | | ||
| [`ImageToTensor`](mmcv.transforms.ImageToTensor) | Convert images to `torch.Tensor` in PyTorch format. | | ||
|
||
## Custom Data Transform Classes | ||
|
||
To implement a new data transform class, the class needs to inherit `BaseTransform` and implement `transform` | ||
method. Here, we use a simple flip transforms (`MyFlip`) as example: | ||
|
||
```python | ||
import random | ||
import mmcv | ||
from mmcv.transforms import BaseTransform, TRANSFORMS | ||
|
||
@TRANSFORMS.register_module() | ||
class MyFlip(BaseTransform): | ||
def __init__(self, direction: str): | ||
super().__init__() | ||
self.direction = direction | ||
|
||
def transform(self, results: dict) -> dict: | ||
img = results['img'] | ||
results['img'] = mmcv.imflip(img, direction=self.direction) | ||
return results | ||
``` | ||
|
||
Then, we can instantiate a `MyFlip` object and use it to process our data dictionary. | ||
|
||
```python | ||
import numpy as np | ||
|
||
transform = MyFlip(direction='horizontal') | ||
data_dict = {'img': np.random.rand(224, 224, 3)} | ||
data_dict = transform(data_dict) | ||
processed_img = data_dict['img'] | ||
``` | ||
|
||
Or, use it in the data pipeline by modifying our config file: | ||
|
||
```python | ||
pipeline = [ | ||
... | ||
dict(type='MyFlip', direction='horizontal'), | ||
... | ||
] | ||
``` | ||
|
||
Please note that to use the class in our config file, we need to confirm the `MyFlip` class will be imported | ||
during running. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,162 @@ | ||
# Migrate Data Transform to OpenMMLab 2.0 | ||
|
||
Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/migration/transform.html). | ||
## Introduction | ||
|
||
According to the data transform interface convention of TorchVision, all data transform classes need to | ||
implement the `__call__` method. And in the convention of OpenMMLab 1.0, we require the input and output of | ||
the `__call__` method should be a dictionary. | ||
|
||
In OpenMMLab 2.0, to make the data transform classes more extensible, we use `transform` method instead of | ||
`__call__` method to implement data transformation, and all data transform classes should inherit the | ||
[`mmcv.transforms.BaseTransfrom`](mmcv.transforms.BaseTransfrom) class. And you can still use these data | ||
transform classes by calling. | ||
|
||
A tutorial to implement a data transform class can be found in the [Data Transform](../advanced_tutorials/data_element.md). | ||
|
||
In addition, we move some common data transform classes from every repositories to MMCV, and in this document, | ||
we will compare the functionalities, usages and implementations between the original data transform classes (in [MMClassification v0.23.2](https://github.com/open-mmlab/mmclassification/tree/v0.23.2), [MMDetection v2.25.1](https://github.com/open-mmlab/mmdetection/tree/v2.25.1)) and the new data transform classes (in [MMCV v2.0.0rc1](https://github.com/open-mmlab/mmcv/tree/dev-2.x)) | ||
|
||
## Functionality Differences | ||
|
||
<table class="colwidths-auto docutils align-default"> | ||
<thead> | ||
<tr> | ||
<th></th> | ||
<th>MMClassification (original)</th> | ||
<th>MMDetection (original)</th> | ||
<th>MMCV (new)</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td><code>LoadImageFromFile</code></td> | ||
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading.</td> | ||
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support | ||
specifying the order of channels.</td> | ||
<td>Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.</td> | ||
</tr> | ||
<tr> | ||
<td><code>LoadAnnotations</code></td> | ||
<td>Not available.</td> | ||
<td>Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system.</td> | ||
<td>Load bbox, label, mask (not include polygon masks), semantic segmentation.</td> | ||
</tr> | ||
<tr> | ||
<td><code>Pad</code></td> | ||
<td>Pad all images in the "img_fields" field.</td> | ||
<td>Pad all images in the "img_fields" field. Support padding to integer multiple size.</td> | ||
<td>Pad the image in the "img" field. Support padding to integer multiple size.</td> | ||
</tr> | ||
<tr> | ||
<td><code>CenterCrop</code></td> | ||
<td>Crop all images in the "img_fields" field. Support cropping as EfficientNet style.</td> | ||
<td>Not available.</td> | ||
<td>Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.</td> | ||
</tr> | ||
<tr> | ||
<td><code>Normalize</code></td> | ||
<td>Normalize the image.</td> | ||
<td>No differences.</td> | ||
<td>No differences, but we recommend to use <a href="../tutorials/model.html#datapreprocessor">data preprocessor</a> to normalize the image.</td> | ||
</tr> | ||
<tr> | ||
<td><code>Resize</code></td> | ||
<td>Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge.</td> | ||
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, the <code>img_scale</code> have a single scale, and <code>multiscale_mode="value"</code>.</td> | ||
<td>Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.</td> | ||
</tr> | ||
<tr> | ||
<td><code>RandomResize</code></td> | ||
<td>Not available</td> | ||
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have two scales and <code>multiscale_mode="range"</code>, or <code>ratio_range</code> is not None. | ||
<pre>Resize( | ||
img_sacle=[(640, 480), (960, 720)], | ||
mode="range", | ||
)</pre> | ||
</td> | ||
<td>Have the same resize function as <code>Resize</code>. Support sampling the scale from a scale range or scale ratio range. | ||
<pre>RandomResize(scale=[(640, 480), (960, 720)])</pre> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>RandomChoiceResize</code></td> | ||
<td>Not available</td> | ||
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have multiple scales, and <code>multiscale_mode="value"</code>. | ||
<pre>Resize( | ||
img_sacle=[(640, 480), (960, 720)], | ||
mode="value", | ||
)</pre> | ||
</td> | ||
<td>Have the same resize function as <code>Resize</code>. Support randomly choosing the scale from multiple scales or multiple scale ratios. | ||
<pre>RandomChoiceResize(scales=[(640, 480), (960, 720)])</pre> | ||
</td> | ||
</tr> | ||
<tr> | ||
<td><code>RandomGrayscale</code></td> | ||
<td>Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale.</td> | ||
<td>Not available</td> | ||
<td>Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.</td> | ||
</tr> | ||
<tr> | ||
<td><code>RandomFlip</code></td> | ||
<td>Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically.</td> | ||
<td>Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td> | ||
<td>Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td> | ||
</tr> | ||
<tr> | ||
<td><code>MultiScaleFlipAug</code></td> | ||
<td>Not available</td> | ||
<td>Used for test-time-augmentation.</td> | ||
<td>Use <code><a href="https://mmcv.readthedocs.io/en/2.x/api/generated/mmcv.transforms.TestTimeAug.html">TestTimeAug</a></code></td> | ||
</tr> | ||
<tr> | ||
<td><code>ToTensor</code></td> | ||
<td>Convert the values in the specified fields to <code>torch.Tensor</code>.</td> | ||
<td>No differences</td> | ||
<td>No differences</td> | ||
</tr> | ||
<tr> | ||
<td><code>ImageToTensor</code></td> | ||
<td>Convert the values in the specified fields to <code>torch.Tensor</code> and transpose the channels to CHW.</td> | ||
<td>No differences.</td> | ||
<td>No differences.</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
## Implementation Differences | ||
|
||
Take `RandomFlip` as example, the new version [RandomFlip](<>) in MMCV inherits `BaseTransfrom`, and move the | ||
functionality implementation from `__call__` to `transform` method. In addition, the randomness related code | ||
is placed in some extra methods and these methods need to be wrapped by `cache_randomness` decorator. | ||
|
||
- MMDetection (original version) | ||
|
||
```python | ||
class RandomFlip: | ||
def __call__(self, results): | ||
"""Randomly flip images.""" | ||
... | ||
# Randomly choose the flip direction | ||
cur_dir = np.random.choice(direction_list, p=flip_ratio_list) | ||
... | ||
return results | ||
``` | ||
|
||
- MMCV (new version) | ||
|
||
```python | ||
class RandomFlip(BaseTransfrom): | ||
def transform(self, results): | ||
"""Randomly flip images""" | ||
... | ||
cur_dir = self._random_direction() | ||
... | ||
return results | ||
|
||
@cache_randomness | ||
def _random_direction(self): | ||
"""Randomly choose the flip direction""" | ||
... | ||
return np.random.choice(direction_list, p=flip_ratio_list) | ||
``` |
Oops, something went wrong.