Finetuning on chairsSDHom epe doesn't go down.

Issue is on training the validation loss goes up too much very quickly. check logs below.

I have added chairsSDHom data loading script as follows.
Changes:
1) Loading data at iterate_data instead of reading all images into a list in main.py
2) added chairsSDHom.py, chairsSDHom.yaml
I have attached all code which i have updated below.

1 . main.py
```
...
...
elif dataset_cfg.dataset.value == "chairsSDHom":
        batch_size=3
        orig_shape= [384,512]
        # training
        chairsSDHom_dataset = chairsSDHom.list_data()
        print(chairsSDHom_dataset['flow'][0])
        from pympler.asizeof import asizeof
        trainImg1 = [file for file in chairsSDHom_dataset['image_0']]
        trainImg2 = [file for file in chairsSDHom_dataset['image_1']]
        trainFlow = [file for file in chairsSDHom_dataset['flow']]
        trainMask = [file for file in chairsSDHom_dataset['mask']]
        trainSize = len(trainFlow)
        training_datasets = [(trainImg1, trainImg2, trainFlow,trainMask)] * batch_size

        # validaion- sintel
        sintel_dataset = sintel.list_data()
        divs = ('training',) if not getattr(config.network, 'class').get() == 'MaskFlownet' else ('training2',)
        for div in divs:
                for k, dataset in sintel_dataset[div].items():
                        dataset = dataset[:samples]
                        img1, img2, flow, mask = [[sintel.load(p) for p in data] for data in zip(*dataset)]
                        validationSize = len(flow)
                        validation_datasets['sintel.' + k] = (img1, img2, flow, mask)
...
...
def iterate_data(iq, dataset):
    if dataset_cfg.dataset.value == 'chairsSDHom' or dataset_cfg.dataset.value == "things3d":
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            if dataset_cfg.dataset.value == "chairsSDHom":
                data = [skimage.io.imread(data[0]),skimage.io.imread(data[1]),chairsSDHom.load(data[2]),skimage.io.imread(data[3])]
            elif dataset_cfg.dataset.value == "things3d":
                data = [cv2.imread(data[0]).astype('uint8'),skimage.io.imread(data[1]).astype('uint8'),things3d.load(data[2]).astype('float16')]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
    else:
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
...
```

rest everthing is same

yet training 

[updated code.zip](https://github.com/microsoft/MaskFlownet/files/5730221/updated.code.zip)
```

Logs:

[2020/12/22 21:36:48] start=0, train=21670, val=224, host=ludwig, batch=3
[2020/12/22 21:36:48] batch=8, config='MaskFlownet_ft.yaml', dataset_cfg='chairsSDHom.yaml', shard=1, gpu_device='1', checkpoint='5adNov03', clear_steps=True, network='MaskFlownet', debug=False, valid=Fa
lse, predict=False, resize=''
[2020/12/22 21:36:54] steps=1, epe=81.23613661839343, total_time=0.00
[2020/12/22 21:37:20] steps=1, sintel.clean=1.4036083221435547, sintel.final=**1.7385120391845703**
[2020/12/22 21:37:20] steps=2, epe=82.52426050579368, total_time=31.65
[2020/12/22 21:37:21] steps=3, epe=70.33922181313649, total_time=15.62
[2020/12/22 21:37:21] steps=4, epe=64.53729546698513, total_time=10.30
[2020/12/22 21:37:21] steps=5, epe=73.13790790314701, total_time=7.64
[2020/12/22 21:37:22] steps=6, epe=69.97008332644914, total_time=6.04
[2020/12/22 21:37:22] steps=7, epe=63.190831684866595, total_time=4.98
[2020/12/22 21:37:23] steps=8, epe=69.54386270096657, total_time=4.23
[2020/12/22 21:37:23] steps=9, epe=71.65906570549198, total_time=3.66
[2020/12/22 21:37:24] steps=10, epe=70.68287622669239, total_time=3.22
[2020/12/22 21:37:24] steps=11, epe=68.10887379487774, total_time=2.88
[2020/12/22 21:37:24] steps=12, epe=65.31357897717663, total_time=2.59
[2020/12/22 21:37:25] steps=13, epe=67.39865911195284, total_time=2.36
[2020/12/22 21:37:25] steps=14, epe=66.05316386284305, total_time=2.16
[2020/12/22 21:37:26] steps=15, epe=62.74090359794587, total_time=1.99
[2020/12/22 21:37:26] steps=16, epe=65.24516708995266, total_time=1.85
[2020/12/22 21:37:27] steps=17, epe=61.783343363284466, total_time=1.72
[2020/12/22 21:37:27] steps=18, epe=66.12157773880946, total_time=1.61
[2020/12/22 21:37:27] steps=19, epe=65.41601491031372, total_time=1.51
[2020/12/22 21:37:28] steps=20, epe=67.27401184191667, total_time=1.42
[2020/12/22 21:37:41] steps=50, epe=64.05605013410363, total_time=0.57
[2020/12/22 21:38:03] steps=100, epe=60.72789733634401, total_time=0.45
[2020/12/22 21:38:30] steps=100, sintel.clean=3.107024669647217, sintel.final=**3.6572041511535645**
[2020/12/22 21:38:51] steps=150, epe=58.168171286698964, total_time=0.55
[2020/12/22 21:39:14] steps=200, epe=55.366796654848244, total_time=0.45
[2020/12/22 21:39:41] steps=200, sintel.clean=4.636238098144531, sintel.final=**5.08129358291626**
[2020/12/22 21:40:03] steps=250, epe=52.92103477169547, total_time=0.56
[2020/12/22 21:40:25] steps=300, epe=50.651504112365515, total_time=0.45
[2020/12/22 21:40:52] steps=300, sintel.clean=5.46751070022583, sintel.final=**5.855245113372803**
[2020/12/22 21:41:13] steps=350, epe=48.90560261388807, total_time=0.55
[2020/12/22 21:41:36] steps=400, epe=47.090479957163055, total_time=0.45
[2020/12/22 21:42:02] steps=400, sintel.clean=6.850785255432129, sintel.final=**7.147568702697754**
[2020/12/22 21:42:24] steps=450, epe=45.47630244939083, total_time=0.55
[2020/12/22 21:42:47] steps=500, epe=43.721847967473224, total_time=0.45
[2020/12/22 21:43:14] steps=500, sintel.clean=7.392406940460205, sintel.final=**7.563663005828857**
[2020/12/22 21:43:36] steps=550, epe=41.861068025751216, total_time=0.56
[2020/12/22 21:43:59] steps=600, epe=40.728338542736246, total_time=0.45
[2020/12/22 21:44:25] steps=600, sintel.clean=8.37342643737793, sintel.final=**8.398472785949707**
[2020/12/22 21:44:47] steps=650, epe=39.22414651439415, total_time=0.55
[2020/12/22 21:45:09] steps=700, epe=38.01273616706755, total_time=0.45
[2020/12/22 21:45:36] steps=700, sintel.clean=8.904271125793457, sintel.final=**8.86906623840332**
[2020/12/22 21:45:57] steps=750, epe=36.68394209224638, total_time=0.55
[2020/12/22 21:46:20] steps=800, epe=35.51223404091925, total_time=0.45
[2020/12/22 21:46:46] steps=800, sintel.clean=9.723841667175293, sintel.final=**9.715934753417969**
[2020/12/22 21:47:08] steps=850, epe=34.441762749200876, total_time=0.55
[2020/12/22 21:47:30] steps=900, epe=33.21928807435762, total_time=0.45
[2020/12/22 21:47:56] steps=900, sintel.clean=10.129880905151367, sintel.final=**10.09166431427002**

```

Question 1) Any idea on why is the network output is such? And how may i fix this?
Question 2) Is there anything you think that is very wrong in the edits i have made?

Thank you so much. Highly appriciate your work.<3 :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning on chairsSDHom epe doesn't go down. #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetuning on chairsSDHom epe doesn't go down. #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions