Skip to content

I use SyncBN to train mask-rcnn, but have problem when test the model,can you give me some suggesition to use SyncBN #847

@ztyxd

Description

@ztyxd

I use SyncBN to train mask-rcnn, I imitate the GN configs and just change the
norm_cfg = dict(type='GN', requires_grad=True) to norm_cfg = dict(type='SyncBN', requires_grad=True), it works well in the train phase.

But when I want to test the model , there is some problems:
File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/base.py", line 87, in forward return self.forward_test(img, img_meta, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/base.py", line 79, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 241, in simple_test x = self.extract_feat(img) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 115, in extract_feat x = self.backbone(img) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/backbones/resnet.py", line 509, in forward x = self.norm1(x) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 455, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 584, in get_world_size return _get_group_size(group) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 200, in _get_group_size _check_default_pg() File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized

The test code works well when i donnot use SyncBN, can you give me some suggesition about this problem?
Thanks a lot

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions