Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support batch augmentation through BatchAugSampler #1757

Merged
merged 2 commits into from
Mar 7, 2023

Conversation

gaotongxiao
Copy link
Collaborator

Motivation

Batch augmentation is a trick that improves the model's generalization ability. It's done by repeating the same data instance for several times and let each of them goes through parallel data augmentations in a single batch.

In MMOCR, the same effect can be achieved by using BatchAugSampler, which generates the same index for num_repeat times consecutively. And the final "dataset length" will be num_repeat times the original dataset length.

That is, a dataset comprising of [0, 1, 2, 3] may be randomly shuffled and sampled by BatchAugSampler and what dataloader finally gets could be [2, 2, 1, 1, 0, 0, 3, 3] when num_repeats=2.

It's also important to make batch size divisible by num_repeats, otherwise the repeated elements are not necessarily placed in the same batch.

Usage

  1. Replace sampler in train_dataloader with BatchAugSampler, and specify num_repeats
  2. Update batch size in train_dataloader, which should be divisible by num_repeats, e.g.
train_dataloader = dict(
    batch_size=4,
    sampler=dict(type='RepeatAugSampler', shuffle=True, num_repeats=2),
)

@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Patch coverage: 75.60% and project coverage change: +2.04 🎉

Comparison is base (62d440f) 88.08% compared to head (7af3618) 90.12%.

❗ Current head 7af3618 differs from pull request most recent head c934e8f. Consider uploading reports for the commit c934e8f to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           dev-1.x    #1757      +/-   ##
===========================================
+ Coverage    88.08%   90.12%   +2.04%     
===========================================
  Files          176      190      +14     
  Lines        11022    11124     +102     
  Branches      1558     1566       +8     
===========================================
+ Hits          9709    10026     +317     
+ Misses        1022      790     -232     
- Partials       291      308      +17     
Flag Coverage Δ
unittests 90.12% <75.60%> (+2.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...atasets/preparers/obtainers/naive_data_obtainer.py 16.82% <5.88%> (ø)
...r/datasets/preparers/packers/wildreceipt_packer.py 16.36% <16.36%> (ø)
mmocr/datasets/preparers/gatherers/naf_gatherer.py 31.42% <31.42%> (ø)
mmocr/datasets/preparers/parsers/coco_parser.py 18.00% <33.33%> (+0.69%) ⬆️
mmocr/datasets/preparers/parsers/base.py 80.00% <77.77%> (+2.22%) ⬆️
mmocr/utils/processing.py 79.31% <79.31%> (ø)
mmocr/datasets/preparers/dumpers/base.py 80.00% <80.00%> (ø)
mmocr/datasets/preparers/packers/base.py 81.25% <81.25%> (ø)
...ocr/datasets/preparers/packers/textrecog_packer.py 81.25% <81.25%> (ø)
mmocr/datasets/preparers/data_preparer.py 83.67% <81.81%> (+15.49%) ⬆️
... and 26 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gaotongxiao gaotongxiao merged commit 47f7fc0 into open-mmlab:dev-1.x Mar 7, 2023
@gaotongxiao gaotongxiao deleted the repeataug branch March 7, 2023 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants