Skip to content

Commit 9963f5b

Browse files
authored
Update 2019-07-23-mapillary-research.md
1 parent c72451d commit 9963f5b

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

_posts/2019-07-23-mapillary-research.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Our solution to these issues is to wrap each batch of variable-sized tensors in
3636

3737
`PackedSequence`s also help us deal with the second problem highlighted above. We slightly modify `DistributedDataParallel` to recognize `PackedSequence` inputs, splitting them in equally sized chunks and distributing their contents across the GPUs.
3838

39-
## Asymmetric computational graphs with DistributedDataParallel
39+
## Asymmetric computational graphs with Distributed Data Parallel
4040

4141
Another, perhaps more subtle, peculiarity of our network is that it can generate asymmetric computational graphs across GPUs. In fact, some of the modules that compose the network are “optional”, in the sense that they are not always computed for all images. As an example, when the Proposal head doesn’t output any proposal, the Mask head is not traversed at all. If we are training on multiple GPUs with `DistributedDataParallel`, this results in one of the replicas not computing gradients for the Mask head parameters.
4242

@@ -55,6 +55,9 @@ Here, we generate a batch of bogus data, pass it through the Mask head, and retu
5555
Starting from PyTorch 1.1 this workaround is no longer required: by setting `find_unused_parameters=True` in the constructor, `DistributedDataParallel` is told to identify parameters whose gradients have not been computed by all replicas and correctly handle them. This leads to some substantial simplifications in our code base!
5656

5757
## In-place Activated BatchNorm
58+
59+
_Github project page: [https://github.com/mapillary/inplace_abn/](https://github.com/mapillary/inplace_abn/)_
60+
5861
Most researchers would probably agree that there are always constraints in terms of available GPU resources, regardless if their research lab has access to only a few or multiple thousands of GPUs. In a time where at Mapillary we still worked at rather few and mostly 12GB Titan X - style prosumer GPUs, we were searching for a solution that virtually enhances the usable memory during training, so we would be able to obtain and push state-of-the-art results on dense labeling tasks like semantic segmentation. In-place activated BatchNorm is enabling us to use up to 50% more memory (at little computational overhead) and is therefore deeply integrated in all our current projects (including Seamless Scene Segmentation described above).
5962

6063
<div class="text-center">

0 commit comments

Comments
 (0)