-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Description
I use SyncBN to train mask-rcnn, I imitate the GN configs and just change the
norm_cfg = dict(type='GN', requires_grad=True) to norm_cfg = dict(type='SyncBN', requires_grad=True), it works well in the train phase.
But when I want to test the model , there is some problems:
File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/base.py", line 87, in forward return self.forward_test(img, img_meta, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/base.py", line 79, in forward_test return self.simple_test(imgs[0], img_metas[0], **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 241, in simple_test x = self.extract_feat(img) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 115, in extract_feat x = self.backbone(img) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/media/xaserver/DATA/swl/Projects/RSSRAI2019/mmdetection/mmdet/models/backbones/resnet.py", line 509, in forward x = self.norm1(x) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 455, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 584, in get_world_size return _get_group_size(group) File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 200, in _get_group_size _check_default_pg() File "/home/xaserver/anaconda3/envs/rssrai_mmdetection/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized
The test code works well when i donnot use SyncBN, can you give me some suggesition about this problem?
Thanks a lot