-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeCamp #147 [Doc] Add Chinese version of train & test tutorial #2355
Conversation
We recommend using English or English & Chinese for pull requests so that we could have broader discussion. |
@@ -0,0 +1,223 @@ | |||
# 教程4:使用现有模型进行训练和测试 | |||
|
|||
MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式、族群式的训练和测试。通过本教程,你将知晓如何用MMsegmentation提供的脚本进行训练和测试。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式、族群式的训练和测试。通过本教程,你将知晓如何用MMsegmentation提供的脚本进行训练和测试。 | |
MMsegmentation 支持在多种设备上训练和测试模型。如下文,具体方式分别为单GPU、分布式以及计算集群的训练和测试。通过本教程,你将知晓如何用 MMsegmentation 提供的脚本进行训练和测试。 |
|
||
- `--work-dir ${工作路径}`: 重新指定工作路径 | ||
- `--amp`: 使用自动混合精度计算 | ||
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint) | |
- `--resume`: 从工作路径中保存的最新检查点文件(checkpoint)恢复训练 |
- `--work-dir ${工作路径}`: 重新指定工作路径 | ||
- `--amp`: 使用自动混合精度计算 | ||
- `--resume`: 从工作路径中调用保存的最新的模型权重文件(checkpoint) | ||
- `--cfg-options ${需更新的具体配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到config配置文件中。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `--cfg-options ${需更新的具体配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到config配置文件中。 | |
- `--cfg-options ${需更覆盖的配置}`: 覆盖已载入的配置中的部分设置,并且 以 xxx=yyy 格式的键值对 将被合并到 config 配置文件中。 |
|
||
下面是对于多GPU测试的可选参数: | ||
|
||
- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。 | |
- `--launcher`: 执行器的启动方式。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。 |
- `--launcher`: 用来分布式任务初始化运载器。允许选择的参数值有 `none`, `pytorch`, `slurm`, `mpi`。特别的,如果设置为none,测试将非分布式模式下进行。 | ||
- `--local_rank`: 分布式中进程的序号。如果没有指定,默认设置为0。 | ||
|
||
**注意:** 在config配置文件中 `--resume` 和 field `load_from` 的不同之处: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**注意:** 在config配置文件中 `--resume` 和 field `load_from` 的不同之处: | |
**注意:** 命令行参数 `--resume` 和在配置文件中的参数 `load_from` 的不同之处: |
基础用法如下: | ||
|
||
```shell | ||
[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数] | |
[GPUS=${GPUS}] sh tools/slurm_test.sh ${分区} ${进程名} ${配置文件} ${检查点文件} [可选参数] |
[GPUS=${GPUS}] sh tools/slurm_test.sh ${划分} ${进程名} ${配置文件} ${检查点文件} [可选参数] | ||
``` | ||
|
||
你可以检查 [the source code](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你可以检查 [the source code](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。 | |
你可以通过 [源码](../../../tools/slurm_test.sh) 来查看全部的参数和环境变量。 |
GPUS=4 GPUS_PER_NODE=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${工作路径} --cfg-options env_cfg.dist_cfg.port=29501 | ||
``` | ||
|
||
2. 通过修改config配置文件,设置不同的通讯端口: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. 通过修改config配置文件,设置不同的通讯端口: | |
2. 通过修改配置文件设置不同的通讯端口: |
enf_cfg = dict(dist_cfg=dict(backend='nccl', port=29501)) | ||
``` | ||
|
||
然后你可以通过 config1.py 和 config2.py 同时进行两个任务: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
然后你可以通过 config1.py 和 config2.py 同时进行两个任务: | |
然后你可以通过 config1.py 和 config2.py 同时启动两个任务: |
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${划分} ${进程名} config2.py ${工作路径} | ||
``` | ||
|
||
3. 使用环境变量设置命令中的端口 'MASTER_PORT': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. 使用环境变量设置命令中的端口 'MASTER_PORT': | |
3. 在命令行中通过环境变量 `MASTER_PORT` 设置端口 : |
Co-authored-by: 谢昕辰 <xiexinch@outlook.com>
…orial open-mmlab#2355 * doc * modify part of content * changed parts of content * modified * Update docs/zh_cn/user_guides/4_train_test.md Co-authored-by: 谢昕辰 <xiexinch@outlook.com>
…-mmlab#2355) correctly locate 3rd file; also correct misleading docs
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist