Skip to content

Commit

Permalink
more tuning on bisenetv2
Browse files Browse the repository at this point in the history
  • Loading branch information
CoinCheung committed Jul 10, 2020
1 parent 81a8885 commit a8ddd22
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 5 deletions.
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,25 @@ BiSeNetV2 is faster and requires less memory, you can try BiSeNetV2 on cityscape
$ export CUDA_VISIBLE_DEVICES=0,1
$ python -m torch.distributed.launch --nproc_per_node=2 bisenetv2/train.py --fp16
```
This would train the model and then compute the mIOU on eval set.
This would train the model and then compute the mIOU on eval set.

~~I barely achieve mIOU of around 71. Though I can boost the performace by adding more regularizations and pretraining, as this would be beyond the scope of the paper, let's wait for the official implementation and see how they achieved that mIOU of 73.~~

Here is the tips how I achieved 74.39 mIOU:
1. larger training scale range: In the paper, they say the images are first resized to range (0.75, 2), then 1024x2048 patches are cropped and resized to 512x1024, which equals to first resized to (0.375, 1) then crop with 512x1024 patches. In my implementation, I first rescale the image by range of (0.25, 2), and then directly crop 512x1024 patches to train.

2. original inference scale: In the paper, they first rescale the image into 512x1024 to run inference, then rescale back to original size of 1024x2048. In my implementation, I directly use original size of 1024x2048 to inference.

3. colorjitter as augmentations.

Note that, like bisenetv1, bisenetv2 also has a relatively big variance. Here is the mIOU after training 5 times on my platform:

| #No. | 1 | 2 | 3 | 4 | 5 |
|:---|:---|:---|:---|:---|:---|
| mIOU | 74.28 | 72.96 | 73.73 | 74.39 | 73.77 |

You can download the pretrained model with mIOU of 74.39 following this [link](https://drive.google.com/file/d/1r_F-KZg-3s2pPcHRIuHZhZ0DQ0wocudk/view?usp=sharing).

I barely achieve mIOU of around 71. Though I can boost the performace by adding more regularizations and pretraining, as this would be beyond the scope of the paper, let's wait for the official implementation and see how they achieved that mIOU of 73.


# BiSeNet
Expand Down
5 changes: 3 additions & 2 deletions bisenetv2/cityscapes_cv2.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,8 @@ class TransformationTrain(object):

def __init__(self):
self.trans_func = T.Compose([
T.RandomResizedCrop([0.375, 1.], [512, 1024]),
# T.RandomResizedCrop([0.375, 1.], [512, 1024]),
T.RandomResizedCrop([0.25, 2], [512, 1024]),
T.RandomHorizontalFlip(),
T.ColorJitter(
brightness=0.4,
Expand All @@ -145,7 +146,7 @@ class TransformationVal(object):

def __call__(self, im_lb):
im, lb = im_lb['im'], im_lb['lb']
im = cv2.resize(im, (1024, 512))
# im = cv2.resize(im, (1024, 512))
return dict(im=im, lb=lb)


Expand Down
2 changes: 1 addition & 1 deletion bisenetv2/evaluatev2.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def evaluate(weight_pth):
)

## evaluator
eval_model(net, 4)
eval_model(net, 2)


def parse_args():
Expand Down
1 change: 1 addition & 0 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def train():
n_img_per_gpu = 8
n_workers = 4
cropsize = [1024, 1024]
# cropsize = [1024, 512]
ds = CityScapes('./data', cropsize=cropsize, mode='train')
sampler = torch.utils.data.distributed.DistributedSampler(ds)
dl = DataLoader(ds,
Expand Down

0 comments on commit a8ddd22

Please sign in to comment.