Skip to content

Improve log, save origin yaml, and fix adan #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 10, 2023

Conversation

SamitHuang
Copy link
Collaborator

@SamitHuang SamitHuang commented May 9, 2023

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

Improve log info.
Archive original yaml config used in training to better ensure reproduction.
Fix Adan optimizer (sync mindcv)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@@ -181,7 +186,6 @@ def _check_batch_size(num_samples, ori_batch_size=32, refine=True):
for bs in range(ori_batch_size - 1, 0, -1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the number of samples in the evaluation set is a prime number, then the batch size will be set to 1. This can significantly increase evaluation time. Can we just set drop_remainder to False for the evaluation set and leave the batch size as it is?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. The time consumption of run batch size = 1 is usually longer than the time running with two different batch sizes, the compiling time of model due to the different batch size is negligible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when we set drop_remainder to False, the last batch will be padded to batch_size, leading to an inaccurate evaluation result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember it is not padded, the remainder will be output with different batch size.

from mindspore.dataset import GeneratorDataset

dataset = GeneratorDataset(range(10), 'data').batch(4)
for x in dataset.create_tuple_iterator(num_epochs=1):
    print(x[0].shape)

>>>(4,)
>>>(4,)
>>>(2,)

@@ -5,6 +5,7 @@ addict
matplotlib
addict
numpy
shutils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not shutil. shutil is a built-in python library, so no need to install with pip.

Copy link
Collaborator

@zhtmike zhtmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment out print function can be removed for better readability

@SamitHuang SamitHuang force-pushed the impr_log branch 2 times, most recently from f27e4be to 3d06a99 Compare May 10, 2023 03:49
@zhtmike
Copy link
Collaborator

zhtmike commented May 10, 2023

btw model.eval API indeed will have undefined behaviour like padding at the end of the iteration. Should use it with care

import mindspore as ms
import mindspore.nn as nn
from mindspore.dataset import GeneratorDataset

class Iter:
    def __len__(self):
        return 10
    
    def __getitem__(self, index):
        return index, index 

dataset = GeneratorDataset(Iter(), ['data', 'label'], shuffle=False).batch(4)

class PrintNet(nn.Cell):
    def construct(self, x):
        print(x)
        return x

loss = nn.MSELoss()
net = PrintNet()
model = ms.Model(net, loss, metrics={'mse'})
model.eval(valid_dataset=dataset)
>>>Tensor(shape=[4], dtype=Int64, value=[0 1 2 3])
>>>Tensor(shape=[4], dtype=Int64, value=[4 5 6 7])
>>>Tensor(shape=[4], dtype=Int64, value=[8 9 6 7])

assert 0.0 < beta2 < 1.0, f"For '{prim_name}', the range of 'beta2' must be (0.0, 1.0), but got {beta2}."
assert eps > 0, f"For '{prim_name}', the 'eps' must be positive, but got {eps}."
assert isinstance(use_locking, bool), f"For '{prim_name}', the type of 'use_locking' must be 'bool', but got type '{type(use_locking).__name__}'."
assert isinstance(beta1, float) and 0 <= beta1 <= 1.0, f"For {prim_name}, beta1 should between 0 and 1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the docstring of function _update_run_op() in line 44 and 45. (0.0, 1.0) -> [0.0, 1.0]

@SamitHuang SamitHuang merged commit d216392 into mindspore-lab:main May 10, 2023
@zhtmike zhtmike mentioned this pull request May 11, 2023
4 tasks
@SamitHuang SamitHuang deleted the impr_log branch May 24, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants