Skip to content

Modify device_id setting way to avoid ambiguity while setting device_id by env variable or yaml #425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2023

Conversation

HaoyangLee
Copy link
Collaborator

@HaoyangLee HaoyangLee commented Jun 19, 2023

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

There are two ways to specify device id while standalone training:

  • (1) export DEVICE_ID=7
  • (2) set 'device_id=7' in system section of yaml config file

We take (1) as the higher priority, namely, (2) is only valid when distribute=False (standalone training) and environment variable 'DEVICE_ID' is NOT set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value is 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to keep it 7 to leave card 0 for distributed training.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of device_id in yaml is suggested to be 7. If device_id is not specified in yaml, mindspore will use device 0 by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but in the table, the column represents the default number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of device_id in yaml is 7 is bit strange. what if the user does not have 8 devices?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but in the table, the column represents the default number.

If we have 'device_id' in yaml, the default value is 7, refer to configs/cls/mobilenetv3/cls_mv3.yaml.
I add more details in yaml_configuration.md to be more clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of device_id in yaml is 7 is bit strange. what if the user does not have 8 devices?

As Rustam said, due to the inconvenience of setting different cards for distributed training on Ascend, the default device_id (for standalone training) in yaml is supposed not to be 0. In the meantime, we don't know how many devices users have. So the compromise is to specify 'device_id=7' in yaml. If no 8 devices, let the error raise.

Copy link
Collaborator

@zhtmike zhtmike Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. But still I think this setting (default to 7) is only convenient for us, and a bit strange to the user

@HaoyangLee HaoyangLee merged commit a23dd36 into mindspore-lab:main Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants