How to set the config for the FreiHAND dataset #41

hxwork · 2022-03-24T03:23:09Z

Hi,

Thanks for making this awesome project open source. When I try to train the RootNet on the FreiHAND dataset, I fail to find the config for this dataset, such as how to set the bbox_real, pixel_mean, and pixel_std. If you can provide the config of the FreiHAND dataset, I will be very appreciative.

The text was updated successfully, but these errors were encountered:

mks0601 · 2022-03-24T03:39:50Z

bbox_real means the real size of the target objects. For example, I assume humans have about 2000 milimeters x 2000 milimeters. If you want to train RootNet on FreiHAND, you might want to set bbox_real to (300,300) as hands have about 300 milimeters x 300 milimeters. Please be careful to the unit. You should check the unit of bbox_real is the same with GT root depth.

hxwork · 2022-03-24T04:08:05Z

Thanks for your rapid reply. I would like to use your pre-trained model on the FreiHAND dataset, and I don't know if I should set the bbox_real=0.3 or bbox_real=300?

mks0601 · 2022-03-24T04:09:45Z

The model that you want to use is pre-trained on human datasets? Then, you can't use it for the hand. You should train it again for the hand. Please look at FreiHAND dataset and decide whether you should set 0.3 or 300. That is based on the unit of GT root depth of FreiHAND dataset.

hxwork · 2022-03-24T04:18:18Z

I want to use the model download from here, and I am not sure if this one is the pre-trained model on the FreiHAND dataset.

mks0601 · 2022-03-24T04:22:42Z

I see. You can use that as that one is pre-trained on FreiHAND.

hxwork · 2022-03-24T04:25:50Z

OK, got it. Thanks for your patient reply again.

mks0601 · 2022-03-24T04:27:39Z

If you set bbox_real to 0.3, then the output root depth is in meter. If you set it to 300, then the output root depth is in milimeter.

hxwork · 2022-03-24T04:29:51Z

OK, got it. Thanks.

hxwork · 2022-03-24T06:26:42Z

Hi,

When I try to load the above-mentioned pre-trained model weights, it seems that the one you released is inconsistent with the codes in this repo, because the keys and weights are missing and some other things are stored in the model dict. I have tried to modify the code of the RootNet to make the model weights be loaded normally, however, I got other errors. Could you please provide the corresponding codes of the pre-trained model?

mks0601 · 2022-03-24T10:24:02Z

Sorry I don't have the codes now :( Why don't you just use predicted outputs of RootNet on FreiHAND? I made them publicly available. https://drive.google.com/file/d/1l1imjCHugUOoTHdL7so9ySXyNw26a0AK/view?usp=sharing

hxwork · 2022-03-25T01:43:09Z

OK. That's because I want to evaluate the pre-trained model on the images captured in the wild. Anyway, I will try to handle this problem, and thanks for your patient reply again.

hxwork · 2022-03-27T07:58:53Z

I changed the code of the Rootnet to the following:

class RootNet(nn.Module):

    def __init__(self):
        self.inplanes = 2048
        self.outplanes = 256

        super(RootNet, self).__init__()
        self.xy_deconv = self._make_deconv_layer(3)
        self.xy_conv = nn.Sequential(nn.Conv2d(in_channels=self.outplanes, out_channels=1, kernel_size=1, stride=1, padding=0))
        self.gamma_layer = nn.Sequential(nn.Linear(self.inplanes, 512), nn.ReLU(inplace=True), nn.Linear(512, 1))

    def _make_deconv_layer(self, num_layers):
        layers = []
        inplanes = self.inplanes
        outplanes = self.outplanes
        for i in range(num_layers):
            layers.append(
                nn.ConvTranspose2d(in_channels=inplanes,
                                   out_channels=outplanes,
                                   kernel_size=4,
                                   stride=2,
                                   padding=1,
                                   output_padding=0,
                                   bias=False))
            layers.append(nn.BatchNorm2d(outplanes))
            layers.append(nn.ReLU(inplace=True))
            inplanes = outplanes

        return nn.Sequential(*layers)

    def forward(self, x, k_value):
        # x,y
        xy = self.xy_deconv(x)
        xy = self.xy_conv(xy)
        xy = xy.view(-1, 1, cfg.output_shape[0] * cfg.output_shape[1])
        xy = F.softmax(xy, 2)
        xy = xy.view(-1, 1, cfg.output_shape[0], cfg.output_shape[1])

        hm_x = xy.sum(dim=(2))
        hm_y = xy.sum(dim=(3))

        coord_x = hm_x * torch.arange(cfg.output_shape[1]).float().cuda()
        coord_y = hm_y * torch.arange(cfg.output_shape[0]).float().cuda()

        coord_x = coord_x.sum(dim=2)
        coord_y = coord_y.sum(dim=2)

        # z
        img_feat = torch.mean(x.view(x.size(0), x.size(1), x.size(2) * x.size(3)), dim=2)  # global average pooling
        # img_feat = torch.unsqueeze(img_feat, 2)
        # img_feat = torch.unsqueeze(img_feat, 3)
        gamma = self.gamma_layer(img_feat)
        gamma = gamma.view(-1, 1)
        depth = gamma * k_value.view(-1, 1)

        coord = torch.cat((coord_x, coord_y, depth), dim=1)
        return coord

    def init_weights(self):
        for name, m in self.deconv_layers.named_modules():
            if isinstance(m, nn.ConvTranspose2d):
                nn.init.normal_(m.weight, std=0.001)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
        for m in self.xy_layer.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, std=0.001)
                nn.init.constant_(m.bias, 0)
        for m in self.depth_layer.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.normal_(m.weight, std=0.001)
                nn.init.constant_(m.bias, 0)


class ResPoseNet(nn.Module):

    def __init__(self, backbone, root):
        super(ResPoseNet, self).__init__()
        self.backbone = backbone
        self.root_net = root

    def forward(self, input_img, k_value, target=None):
        _, fm = self.backbone(input_img)
        coord = self.root_net(fm, k_value)

        if target is None:
            return coord
        else:
            target_coord = target["coord"]
            target_vis = target["vis"]
            target_have_depth = target["have_depth"]

            ## coordrinate loss
            loss_coord = torch.abs(coord - target_coord) * target_vis
            loss_coord = (loss_coord[:, 0] + loss_coord[:, 1] + loss_coord[:, 2] * target_have_depth.view(-1)) / 3.
            return loss_coord

Then, the pre-trained model weights for the FreiHAND dataset can be loaded successfully.

hxwork closed this as completed Mar 24, 2022

hxwork reopened this Mar 24, 2022

hxwork closed this as completed Mar 24, 2022

hxwork reopened this Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set the config for the FreiHAND dataset #41

How to set the config for the FreiHAND dataset #41

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 25, 2022

hxwork commented Mar 27, 2022

How to set the config for the FreiHAND dataset #41

How to set the config for the FreiHAND dataset #41

Comments

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 24, 2022

hxwork commented Mar 24, 2022

mks0601 commented Mar 24, 2022

hxwork commented Mar 25, 2022

hxwork commented Mar 27, 2022