[Feature] Support batch augmentation through BatchAugSampler #1757
+53
−45
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Batch augmentation is a trick that improves the model's generalization ability. It's done by repeating the same data instance for several times and let each of them goes through parallel data augmentations in a single batch.
In MMOCR, the same effect can be achieved by using
BatchAugSampler
, which generates the same index fornum_repeat
times consecutively. And the final "dataset length" will benum_repeat
times the original dataset length.That is, a dataset comprising of
[0, 1, 2, 3]
may be randomly shuffled and sampled byBatchAugSampler
and what dataloader finally gets could be[2, 2, 1, 1, 0, 0, 3, 3]
whennum_repeats=2
.It's also important to make batch size divisible by
num_repeats
, otherwise the repeated elements are not necessarily placed in the same batch.Usage
train_dataloader
withBatchAugSampler
, and specifynum_repeats
train_dataloader
, which should be divisible bynum_repeats
, e.g.