Improve log, save origin yaml, and fix adan #272

SamitHuang · 2023-05-09T14:11:09Z

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

You have read the Contributing Guidelines on pull requests
Your code builds clean without any errors or warnings
You are using approved terminology
You have added unit tests

Motivation

Improve log info.
Archive original yaml config used in training to better ensure reproduction.
Fix Adan optimizer (sync mindcv)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

hadipash · 2023-05-10T01:29:48Z

mindocr/data/builder.py

@@ -181,7 +186,6 @@ def _check_batch_size(num_samples, ori_batch_size=32, refine=True):
        for bs in range(ori_batch_size - 1, 0, -1):


if the number of samples in the evaluation set is a prime number, then the batch size will be set to 1. This can significantly increase evaluation time. Can we just set drop_remainder to False for the evaluation set and leave the batch size as it is?

agree. The time consumption of run batch size = 1 is usually longer than the time running with two different batch sizes, the compiling time of model due to the different batch size is negligible

Even when we set drop_remainder to False, the last batch will be padded to batch_size, leading to an inaccurate evaluation result.

I remember it is not padded, the remainder will be output with different batch size.

from mindspore.dataset import GeneratorDataset dataset = GeneratorDataset(range(10), 'data').batch(4) for x in dataset.create_tuple_iterator(num_epochs=1): print(x[0].shape) >>>(4,) >>>(4,) >>>(2,)

mindocr/utils/logger.py

tools/train.py

hadipash · 2023-05-10T02:01:20Z

requirements.txt

@@ -5,6 +5,7 @@ addict
 matplotlib
 addict
 numpy
+shutils


this is not shutil. shutil is a built-in python library, so no need to install with pip.

zhtmike

The comment out print function can be removed for better readability

zhtmike · 2023-05-10T10:08:48Z

btw model.eval API indeed will have undefined behaviour like padding at the end of the iteration. Should use it with care

import mindspore as ms
import mindspore.nn as nn
from mindspore.dataset import GeneratorDataset

class Iter:
    def __len__(self):
        return 10
    
    def __getitem__(self, index):
        return index, index 

dataset = GeneratorDataset(Iter(), ['data', 'label'], shuffle=False).batch(4)

class PrintNet(nn.Cell):
    def construct(self, x):
        print(x)
        return x

loss = nn.MSELoss()
net = PrintNet()
model = ms.Model(net, loss, metrics={'mse'})
model.eval(valid_dataset=dataset)
>>>Tensor(shape=[4], dtype=Int64, value=[0 1 2 3])
>>>Tensor(shape=[4], dtype=Int64, value=[4 5 6 7])
>>>Tensor(shape=[4], dtype=Int64, value=[8 9 6 7])

HaoyangLee · 2023-05-10T10:50:40Z

mindocr/optim/adan.py

-    assert 0.0 < beta2 < 1.0, f"For '{prim_name}', the range of 'beta2' must be (0.0, 1.0), but got {beta2}."
-    assert eps > 0, f"For '{prim_name}', the 'eps' must be positive, but got {eps}."
-    assert isinstance(use_locking, bool), f"For '{prim_name}', the type of 'use_locking' must be 'bool', but got type '{type(use_locking).__name__}'."
+    assert isinstance(beta1, float) and 0 <= beta1 <= 1.0, f"For {prim_name}, beta1 should between 0 and 1"


Update the docstring of function _update_run_op() in line 44 and 45. (0.0, 1.0) -> [0.0, 1.0]

SamitHuang requested review from HaoyangLee, hadipash, hqkate and zhtmike May 9, 2023 14:11

hadipash reviewed May 10, 2023

View reviewed changes

zhtmike reviewed May 10, 2023

View reviewed changes

SamitHuang force-pushed the impr_log branch 2 times, most recently from f27e4be to 3d06a99 Compare May 10, 2023 03:49

improve log, save origin yaml, and fix adan

3d06a99

zhtmike approved these changes May 10, 2023

View reviewed changes

HaoyangLee approved these changes May 10, 2023

View reviewed changes

SamitHuang merged commit d216392 into mindspore-lab:main May 10, 2023

zhtmike mentioned this pull request May 11, 2023

fix the formatting error #277

Merged

4 tasks

SamitHuang deleted the impr_log branch May 24, 2023 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve log, save origin yaml, and fix adan #272

Improve log, save origin yaml, and fix adan #272

Uh oh!

SamitHuang commented May 9, 2023 •

edited

Loading

Uh oh!

hadipash May 10, 2023

Uh oh!

zhtmike May 10, 2023

Uh oh!

SamitHuang May 10, 2023

Uh oh!

zhtmike May 10, 2023

Uh oh!

Uh oh!

Uh oh!

hadipash May 10, 2023

Uh oh!

zhtmike left a comment

Uh oh!

zhtmike commented May 10, 2023

Uh oh!

HaoyangLee May 10, 2023

Uh oh!

Uh oh!

		@@ -181,7 +186,6 @@ def _check_batch_size(num_samples, ori_batch_size=32, refine=True):
		for bs in range(ori_batch_size - 1, 0, -1):

Improve log, save origin yaml, and fix adan #272

Improve log, save origin yaml, and fix adan #272

Uh oh!

Conversation

SamitHuang commented May 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Plan

Related Issues and PRs

Uh oh!

hadipash May 10, 2023

Choose a reason for hiding this comment

Uh oh!

zhtmike May 10, 2023

Choose a reason for hiding this comment

Uh oh!

SamitHuang May 10, 2023

Choose a reason for hiding this comment

Uh oh!

zhtmike May 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hadipash May 10, 2023

Choose a reason for hiding this comment

Uh oh!

zhtmike left a comment

Choose a reason for hiding this comment

Uh oh!

zhtmike commented May 10, 2023

Uh oh!

HaoyangLee May 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SamitHuang commented May 9, 2023 •

edited

Loading