Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Finetuning on chairsSDHom epe doesn't go down. #31

@vivasvan1

Description

@vivasvan1

Issue is on training the validation loss goes up too much very quickly. check logs below.

I have added chairsSDHom data loading script as follows.
Changes:

  1. Loading data at iterate_data instead of reading all images into a list in main.py
  2. added chairsSDHom.py, chairsSDHom.yaml
    I have attached all code which i have updated below.

1 . main.py

...
...
elif dataset_cfg.dataset.value == "chairsSDHom":
        batch_size=3
        orig_shape= [384,512]
        # training
        chairsSDHom_dataset = chairsSDHom.list_data()
        print(chairsSDHom_dataset['flow'][0])
        from pympler.asizeof import asizeof
        trainImg1 = [file for file in chairsSDHom_dataset['image_0']]
        trainImg2 = [file for file in chairsSDHom_dataset['image_1']]
        trainFlow = [file for file in chairsSDHom_dataset['flow']]
        trainMask = [file for file in chairsSDHom_dataset['mask']]
        trainSize = len(trainFlow)
        training_datasets = [(trainImg1, trainImg2, trainFlow,trainMask)] * batch_size

        # validaion- sintel
        sintel_dataset = sintel.list_data()
        divs = ('training',) if not getattr(config.network, 'class').get() == 'MaskFlownet' else ('training2',)
        for div in divs:
                for k, dataset in sintel_dataset[div].items():
                        dataset = dataset[:samples]
                        img1, img2, flow, mask = [[sintel.load(p) for p in data] for data in zip(*dataset)]
                        validationSize = len(flow)
                        validation_datasets['sintel.' + k] = (img1, img2, flow, mask)
...
...
def iterate_data(iq, dataset):
    if dataset_cfg.dataset.value == 'chairsSDHom' or dataset_cfg.dataset.value == "things3d":
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            if dataset_cfg.dataset.value == "chairsSDHom":
                data = [skimage.io.imread(data[0]),skimage.io.imread(data[1]),chairsSDHom.load(data[2]),skimage.io.imread(data[3])]
            elif dataset_cfg.dataset.value == "things3d":
                data = [cv2.imread(data[0]).astype('uint8'),skimage.io.imread(data[1]).astype('uint8'),things3d.load(data[2]).astype('float16')]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
    else:
        gen = index_generator(len(dataset[0]))
        while True:
            i = next(gen)
            data = [item[i] for item in dataset]
            space_x, space_y = data[0].shape[0] - orig_shape[0], data[0].shape[1] - orig_shape[1]
            crop_x, crop_y = space_x and np.random.randint(space_x), space_y and np.random.randint(space_y)
            data = [np.transpose(arr[crop_x: crop_x + orig_shape[0], crop_y: crop_y + orig_shape[1]], (2, 0, 1)) for arr in data]
            # vertical flip
            if np.random.randint(2):
                data = [arr[:, :, ::-1] for arr in data]
                data[2] = np.stack([-data[2][0, :, :], data[2][1, :, :]], axis = 0)
            iq.put(data)
...

rest everthing is same

yet training

updated code.zip


Logs:

[2020/12/22 21:36:48] start=0, train=21670, val=224, host=ludwig, batch=3
[2020/12/22 21:36:48] batch=8, config='MaskFlownet_ft.yaml', dataset_cfg='chairsSDHom.yaml', shard=1, gpu_device='1', checkpoint='5adNov03', clear_steps=True, network='MaskFlownet', debug=False, valid=Fa
lse, predict=False, resize=''
[2020/12/22 21:36:54] steps=1, epe=81.23613661839343, total_time=0.00
[2020/12/22 21:37:20] steps=1, sintel.clean=1.4036083221435547, sintel.final=**1.7385120391845703**
[2020/12/22 21:37:20] steps=2, epe=82.52426050579368, total_time=31.65
[2020/12/22 21:37:21] steps=3, epe=70.33922181313649, total_time=15.62
[2020/12/22 21:37:21] steps=4, epe=64.53729546698513, total_time=10.30
[2020/12/22 21:37:21] steps=5, epe=73.13790790314701, total_time=7.64
[2020/12/22 21:37:22] steps=6, epe=69.97008332644914, total_time=6.04
[2020/12/22 21:37:22] steps=7, epe=63.190831684866595, total_time=4.98
[2020/12/22 21:37:23] steps=8, epe=69.54386270096657, total_time=4.23
[2020/12/22 21:37:23] steps=9, epe=71.65906570549198, total_time=3.66
[2020/12/22 21:37:24] steps=10, epe=70.68287622669239, total_time=3.22
[2020/12/22 21:37:24] steps=11, epe=68.10887379487774, total_time=2.88
[2020/12/22 21:37:24] steps=12, epe=65.31357897717663, total_time=2.59
[2020/12/22 21:37:25] steps=13, epe=67.39865911195284, total_time=2.36
[2020/12/22 21:37:25] steps=14, epe=66.05316386284305, total_time=2.16
[2020/12/22 21:37:26] steps=15, epe=62.74090359794587, total_time=1.99
[2020/12/22 21:37:26] steps=16, epe=65.24516708995266, total_time=1.85
[2020/12/22 21:37:27] steps=17, epe=61.783343363284466, total_time=1.72
[2020/12/22 21:37:27] steps=18, epe=66.12157773880946, total_time=1.61
[2020/12/22 21:37:27] steps=19, epe=65.41601491031372, total_time=1.51
[2020/12/22 21:37:28] steps=20, epe=67.27401184191667, total_time=1.42
[2020/12/22 21:37:41] steps=50, epe=64.05605013410363, total_time=0.57
[2020/12/22 21:38:03] steps=100, epe=60.72789733634401, total_time=0.45
[2020/12/22 21:38:30] steps=100, sintel.clean=3.107024669647217, sintel.final=**3.6572041511535645**
[2020/12/22 21:38:51] steps=150, epe=58.168171286698964, total_time=0.55
[2020/12/22 21:39:14] steps=200, epe=55.366796654848244, total_time=0.45
[2020/12/22 21:39:41] steps=200, sintel.clean=4.636238098144531, sintel.final=**5.08129358291626**
[2020/12/22 21:40:03] steps=250, epe=52.92103477169547, total_time=0.56
[2020/12/22 21:40:25] steps=300, epe=50.651504112365515, total_time=0.45
[2020/12/22 21:40:52] steps=300, sintel.clean=5.46751070022583, sintel.final=**5.855245113372803**
[2020/12/22 21:41:13] steps=350, epe=48.90560261388807, total_time=0.55
[2020/12/22 21:41:36] steps=400, epe=47.090479957163055, total_time=0.45
[2020/12/22 21:42:02] steps=400, sintel.clean=6.850785255432129, sintel.final=**7.147568702697754**
[2020/12/22 21:42:24] steps=450, epe=45.47630244939083, total_time=0.55
[2020/12/22 21:42:47] steps=500, epe=43.721847967473224, total_time=0.45
[2020/12/22 21:43:14] steps=500, sintel.clean=7.392406940460205, sintel.final=**7.563663005828857**
[2020/12/22 21:43:36] steps=550, epe=41.861068025751216, total_time=0.56
[2020/12/22 21:43:59] steps=600, epe=40.728338542736246, total_time=0.45
[2020/12/22 21:44:25] steps=600, sintel.clean=8.37342643737793, sintel.final=**8.398472785949707**
[2020/12/22 21:44:47] steps=650, epe=39.22414651439415, total_time=0.55
[2020/12/22 21:45:09] steps=700, epe=38.01273616706755, total_time=0.45
[2020/12/22 21:45:36] steps=700, sintel.clean=8.904271125793457, sintel.final=**8.86906623840332**
[2020/12/22 21:45:57] steps=750, epe=36.68394209224638, total_time=0.55
[2020/12/22 21:46:20] steps=800, epe=35.51223404091925, total_time=0.45
[2020/12/22 21:46:46] steps=800, sintel.clean=9.723841667175293, sintel.final=**9.715934753417969**
[2020/12/22 21:47:08] steps=850, epe=34.441762749200876, total_time=0.55
[2020/12/22 21:47:30] steps=900, epe=33.21928807435762, total_time=0.45
[2020/12/22 21:47:56] steps=900, sintel.clean=10.129880905151367, sintel.final=**10.09166431427002**

Question 1) Any idea on why is the network output is such? And how may i fix this?
Question 2) Is there anything you think that is very wrong in the edits i have made?

Thank you so much. Highly appriciate your work.<3 :D

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions