graspnet · Galaxies99 · Sep 16, 2022 · Sep 20, 2022 · Sep 20, 2022 · Sep 20, 2022
diff --git a/README.md b/README.md
@@ -1,13 +1,29 @@
-# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping
+# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and A Grasping Baseline
 
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcg-a-large-scale-real-world-dataset-for/transparent-object-depth-estimation-on)](https://paperswithcode.com/sota/transparent-object-depth-estimation-on?p=transcg-a-large-scale-real-world-dataset-for) [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]
+[![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]
 
-[[Paper]](https://arxiv.org/pdf/2202.08471)  [[Project Page]](https://graspnet.net/transcg)
+[[Paper (IEEE Xplore)]](https://ieeexplore.ieee.org/document/9796631) [[Paper (arXiv)]](https://arxiv.org/pdf/2202.08471) [[Project Page]](https://graspnet.net/transcg)
 
 **Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Hao-Shu Fang](https://github.com/fang-haoshu), [Sheng Xu](https://github.com/XS1020), [Cewu Lu](https://mvig.sjtu.edu.cn/).
 
 Welcome to the official repository for the TransCG paper. This repository includes the dataset and the proposed Depth Filler Net (DFNet) models.
 
+## News
+
+2022-10-14: The correct checkpoint has been updated. Check the [Quick Start](#quick-start) section for details. Many thanks to [@ZhiyangZhou24](https://github.com/ZhiyangZhou24) for reporting the issue.
+
+2022-10-10: New checkpoint and source code are released. Check the [Quick Start](#quick-start) section for details. This checkpoint and source code fixes the shifting problem to a large extent (now only ~2cm shifting, which can be solved further using many engineering methods), and use interpolation to solve the empty hole problem. Many thanks to [@haojieh](https://github.com/haojieh) and [@mtbui2010](https://github.com/mtbui2010) for mentioning it. The new checkpoint has improved several metrics, see details in [assets/docs/DFNet.md](assets/docs/DFNet.md).
+
+2022-10-02: For checkpoint and source code that correspond to the paper, please see [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of our repository. Shifting problems in this version may be solved by calculating the difference of the average depth before and after refining, and then subtract the difference from the refining depths.
+
+2022-09-16: New version of DFNet code is released. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs and [@haberger](https://github.com/haberger) for mentioning it.
+
+2022-06-15: Our TransCG paper is published in IEEE Robotics and Automation Letters Vol. 7, No. 3, 2022, and is available at [IEEE Xplore](https://ieeexplore.ieee.org/document/9796631).
+
+2022-06-01: Our TransCG paper is accepted by RA-L.
+
+2022-02-17: Our paper is released on [arXiv](https://arxiv.org/pdf/2202.08471), and submitted to IEEE Robotics and Automation Letters (RA-L).
+
 ## TransCG Dataset
 
 <img align="right" src="assets/imgs/TransCG.gif" width=240px> TransCG dataset is now available on [official page](https://graspnet.net/transcg). TransCG dataset is the first large-scale real-world dataset for transparent object depth completion and grasping. In total, our dataset contains 57,715 RGB-D images of 51 transparent objects and many opaque objects captured from different perspectives of 130 scenes under various real-world settings. The 3D mesh model of the transparent objects are also provided in our dataset.
@@ -41,9 +57,7 @@ pip install -r requirements.txt
 
 ### Quick Start
 
-**NOTE.** The following checkpoint is compatible with [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76). We will update the checkpoint of the latest version later.
-
-Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network.
+Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1oZi9zdOg0WYuTHM10xlyq5FRlfoKDKzU/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/1G9OaZ1Kk-KmHWOUHARsgNQ) (Code: bpes). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network.
 
 ### Grasping Demo
 

diff --git a/assets/docs/DFNet.md b/assets/docs/DFNet.md
@@ -10,15 +10,23 @@ The architecture of our proposed end-to-end depth completion network DFNet is sh
 
 ## Experiments
 
-<img align="middle" src='../imgs/exper1.png' width=720px>
+|  Method   | RMSE | REL | MAE | Delta 1.05 | Delta 1.10 | Delta 1.25 | GPU Mem. Occ. | Infer. Time | Model Size |
+| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
+| ClearGrasp | 0.054 | 0.083 | 0.037 | 50.48 | 68.68 | 95.28 | 2.1 GB | 2.2813s | 934 MB |
+| LIDF-Refine | 0.019 | 0.034 | 0.015 | 78.22 | 94.26 | 99.80 | 6.2 GB | 0.0182s | 251 MB |
+| TranspareNet* | 0.026 | **0.023** | **0.013** | **88.45** | 96.25 | 99.42 | 1.9 GB | 0.0354s | 336 MB |
+| DFNet** | **0.018** | 0.026 | **0.013** | 84.94 | **96.57** | **99.85** | **1.6 GB** | **0.0166s** | **5.2 MB** |
 
-Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources, as shown in the following figures.
+Here, ClearGrasp refers to [1], LIDF-Refine refers to [2], TranspareNet refers to [3], and DFNet refers to our proposed Depth Filler Net.
 
-<img align="middle" src='../imgs/exper2.png' width=460px>
+*: TranspareNet [3] is a concurrent work with our project.
 
-**Note**. In experiment tables above, ClearGrasp (or [34]) refers to "ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation" (ICRA 2020), and LIDF-Refine (or [41]) refers to "RGB-D Local Implicit Function for Depth Completion of Transparent Objects" (CVPR 2021).
+**: Here, we use the newly released checkpoint of DFNet, which is slightly different from the checkpoint used in the paper. The newly released checkpoint fixes the bugs of point cloud shifting mentioned in [Issue #4](https://github.com/Galaxies99/TransCG/issues/4) and the black-hole problem mentioned in [Issue #7](https://github.com/Galaxies99/TransCG/issues/7). 
+
+For original checkpoint that is used in the paper, please use [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of the repository, and see [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g) for downloading it. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs in [Issue #5](https://github.com/Galaxies99/TransCG/issues/5).
 
 ## References
 
 1. Sajjan, Shreeyak, et al. "Clear grasp: 3d shape estimation of transparent objects for manipulation." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
 2. Zhu, Luyang, et al. "RGB-D Local Implicit Function for Depth Completion of Transparent Objects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
+3. Xu, Haoping, et al. "Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects." 5th Annual Conference on Robot Learning. 2021.
diff --git a/assets/docs/grasping.md b/assets/docs/grasping.md
@@ -6,8 +6,6 @@ Given an RGB image along with a depth image collected by an RGB-D camera, we fir
 
 ## Experiments
 
-<img align="middle" src='../imgs/exper3.png' width=460px>
-
 ## Reference
 
 1. Fang, Hao-Shu, et al. "Graspnet-1billion: A large-scale benchmark for general object grasping." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
diff --git a/assets/imgs/exper1.png b/assets/imgs/exper1.png
diff --git a/assets/imgs/exper2.png b/assets/imgs/exper2.png
diff --git a/assets/imgs/exper3.png b/assets/imgs/exper3.png
diff --git a/configs/default.yaml b/configs/default.yaml
@@ -25,16 +25,18 @@
     "use_augmentation": True
     "rgb_augmentation_probability": 0.8
     "depth_min": 0.3
-    "depth_max": 1.5
+    "depth_max": 1.0
     "depth_norm": 1.0
+    "with_original": True
   "test":
     "type": "transcg"
     "data_dir": "data"
     "image_size": !!python/tuple [320, 240]
     "use_augmentation": False
     "depth_min": 0.3
-    "depth_max": 1.5
+    "depth_max": 1.0
     "depth_norm": 1.0
+    "with_original": True
 
 "dataloader":
   "num_workers": 48

diff --git a/configs/inference.yaml b/configs/inference.yaml
@@ -11,5 +11,5 @@
   "image_size": !!python/tuple [320, 240]
   "cuda_id": 0
   "depth_min": 0.3
-  "depth_max": 1.5
+  "depth_max": 1.0
   "depth_norm": 1.0
diff --git a/datasets/transcg.py b/datasets/transcg.py
@@ -51,13 +51,15 @@ def __init__(self, data_dir, split = 'train', **kwargs):
                 self.sample_info.append([
                     os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)),
                     1, # (for D435)
-                    scene_type
+                    scene_type,
+                    perspective_id
                 ])
             for perspective_id in self.scene_metadata[scene_id]['L515_valid_perspective_list']:
                 self.sample_info.append([
                     os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)),
                     2, # (for L515)
-                    scene_type
+                    scene_type,
+                    perspective_id
                 ])
         # Integrity double-check
         assert len(self.sample_info) == self.total_samples, "Error in total samples, expect {} samples, found {} samples.".format(self.total_samples, len(self.sample_info))
@@ -72,12 +74,16 @@ def __init__(self, data_dir, split = 'train', **kwargs):
         self.with_original = kwargs.get('with_original', False)
 
     def __getitem__(self, id):
-        img_path, camera_type, scene_type = self.sample_info[id]
+        img_path, camera_type, scene_type, perspective_id = self.sample_info[id]
         rgb = np.array(Image.open(os.path.join(img_path, 'rgb{}.png'.format(camera_type))), dtype = np.float32)
         depth = np.array(Image.open(os.path.join(img_path, 'depth{}.png'.format(camera_type))), dtype = np.float32)
         depth_gt = np.array(Image.open(os.path.join(img_path, 'depth{}-gt.png'.format(camera_type))), dtype = np.float32)
         depth_gt_mask = np.array(Image.open(os.path.join(img_path, 'depth{}-gt-mask.png'.format(camera_type))), dtype = np.uint8)
-        return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, with_original = self.with_original)
+        if camera_type == 1:
+            depth_coeff = (perspective_id // 20) / 12 + 1
+        else:
+            depth_coeff = 1
+        return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, depth_coeff = depth_coeff, inpainting = True, with_original = self.with_original)
 
     def __len__(self):
         return self.total_samples
diff --git a/inference.py b/inference.py
@@ -15,6 +15,7 @@
 from utils.logger import ColoredLogger
 from utils.builder import ConfigBuilder
 from time import perf_counter
+from scipy.interpolate import NearestNDInterpolator
 
 
 class Inferencer(object):
@@ -68,7 +69,7 @@ def __init__(self, cfg_path = os.path.join('configs', 'inference.yaml'), with_in
         self.depth_min, self.depth_max = self.builder.get_inference_depth_min_max()
         self.depth_norm = self.builder.get_inference_depth_norm()
 
-    def inference(self, rgb, depth, target_size = (1280, 720)):
+    def inference(self, rgb, depth, target_size = (1280, 720), depth_coefficient = 10.0, inpainting = True):
         """
         Inference.
 
@@ -77,7 +78,11 @@ def inference(self, rgb, depth, target_size = (1280, 720)):
 
         rgb, depth: the initial RGB-D image;
 
-        target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size.
+        target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size;
+
+        depth_coefficient: float, optional, default: 10.0, only regard [depth_mu - depth_coefficient * depth_std, depth_mu + depth_coefficient * depth_std] as the valid pixels;
+
+        inpainting: bool, default: True, whether to inpaint the invalid pixels.
 
         Returns
         -------
@@ -90,7 +95,20 @@ def inference(self, rgb, depth, target_size = (1280, 720)):
         depth = np.where(depth < self.depth_min, 0, depth)
         depth = np.where(depth > self.depth_max, 0, depth)
         depth[np.isnan(depth)] = 0
+        depth_available = depth[depth > 0]
+        depth_mu = depth_available.mean() if depth_available.shape[0] != 0 else 0
+        depth_std = depth_available.std() if depth_available.shape[0] != 0 else 1
+        depth = np.where(depth < depth_mu - depth_coefficient * depth_std, 0, depth)
+        depth = np.where(depth > depth_mu + depth_coefficient * depth_std, 0, depth)
+        if inpainting:
+            mask = np.where(depth > 0)
+            if mask[0].shape[0] != 0:
+                interp = NearestNDInterpolator(np.transpose(mask), depth[mask])
+                depth = interp(*np.indices(depth.shape))
         depth = depth / self.depth_norm
+        depth_min = depth.min() - 0.5 * depth.std() - 1e-6
+        depth_max = depth.max() + 0.5 * depth.std() + 1e-6
+        depth = (depth - depth_min) / (depth_max - depth_min)
         rgb = (rgb / 255.0).transpose(2, 0, 1)
         rgb = torch.FloatTensor(rgb).to(self.device).unsqueeze(0)
         depth = torch.FloatTensor(depth).to(self.device).unsqueeze(0)
@@ -101,7 +119,12 @@ def inference(self, rgb, depth, target_size = (1280, 720)):
         if self.with_info:
             self.logger.info("Inference finished, time: {:.4f}s.".format(time_end - time_start))
         depth_res = depth_res.squeeze(0).cpu().detach().numpy()
+        depth_ori = depth.squeeze(0).cpu().detach().numpy()
+        depth_res = depth_res * (depth_max - depth_min) + depth_min
+        depth_ori = depth_ori * (depth_max - depth_min) + depth_min
         depth_res = depth_res * self.depth_norm
+        depth_ori = depth_ori * self.depth_norm
         depth_res = cv2.resize(depth_res, target_size, interpolation = cv2.INTER_NEAREST)
-        return depth_res
+        depth_ori = cv2.resize(depth_ori, target_size, interpolation = cv2.INTER_NEAREST)
+        return depth_res, depth_ori
 
diff --git a/models/DFNet.py b/models/DFNet.py
@@ -149,8 +149,7 @@ def __init__(self, in_channels = 4, hidden_channels = 64, L = 5, k = 12, use_DUC
             nn.Conv2d(self.hidden_channels, self.hidden_channels, kernel_size = 3, stride = 1, padding = 1),
             nn.BatchNorm2d(self.hidden_channels),
             nn.ReLU(True),
-            nn.Conv2d(self.hidden_channels, 1, kernel_size = 3, stride = 1, padding = 1),
-            nn.ReLU(True)
+            nn.Conv2d(self.hidden_channels, 1, kernel_size = 1, stride = 1)
         )
 
     def _make_upconv(self, in_channels, out_channels, upscale_factor = 2):

diff --git a/sample_inference.py b/sample_inference.py
@@ -45,19 +45,19 @@ def draw_point_cloud(color, depth, camera_intrinsics, use_mask = False, use_inpa
 
 inferencer = Inferencer()
 
-rgb = np.array(Image.open('data/scene1/1/rgb1.png'), dtype = np.float32)
-depth = np.array(Image.open('data/scene1/1/depth1.png'), dtype = np.float32)
-depth_gt = np.array(Image.open('data/scene1/1/depth1-gt.png'), dtype = np.float32)
+rgb = np.array(Image.open('data/scene21/1/rgb1.png'), dtype = np.float32)
+depth = np.array(Image.open('data/scene21/1/depth1.png'), dtype = np.float32)
+depth_gt = np.array(Image.open('data/scene21/1/depth1-gt.png'), dtype = np.float32)
 
 depth = depth / 1000
 depth_gt = depth_gt / 1000
 
-res = inferencer.inference(rgb, depth)
+res, depth = inferencer.inference(rgb, depth, depth_coefficient = 3, inpainting = True)
 
 cam_intrinsics = np.load('data/camera_intrinsics/1-camIntrinsics-D435.npy')
 
-res = np.clip(res, 0.1, 1.5)
-depth = np.clip(depth, 0.1, 1.5)
+res = np.clip(res, 0.3, 1.0)
+depth = np.clip(depth, 0.3, 1.0)
 
 cloud = draw_point_cloud(rgb, res, cam_intrinsics, scale = 1.0)
 cloud_gt = draw_point_cloud(rgb, depth_gt, cam_intrinsics, scale = 1.0)

diff --git a/test.py b/test.py
@@ -73,6 +73,8 @@ def test():
                 time_start = perf_counter()
                 res = model(data_dict['rgb'], data_dict['depth'])
                 time_end = perf_counter()
+                depth_scale = data_dict['depth_max'] - data_dict['depth_min']
+                res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
                 data_dict['pred'] = res
                 _ = metrics.evaluate_batch(data_dict, record = True)
             duration = time_end - time_start

diff --git a/train.py b/train.py
@@ -90,7 +90,9 @@ def train_one_epoch(epoch):
         for data_dict in pbar:
             optimizer.zero_grad()
             data_dict = to_device(data_dict, device)
-            res = model(data_dict['rgb'], data_dict['depth'])
+            res = model(data_dict['rgb'], data_dict['depth']) 
+            depth_scale = data_dict['depth_max'] - data_dict['depth_min']
+            res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
             data_dict['pred'] = res
             loss_dict = criterion(data_dict)
             loss = loss_dict['loss']
@@ -118,6 +120,8 @@ def test_one_epoch(epoch):
                 time_start = perf_counter()
                 res = model(data_dict['rgb'], data_dict['depth'])
                 time_end = perf_counter()
+                depth_scale = data_dict['depth_max'] - data_dict['depth_min']
+                res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
                 data_dict['pred'] = res
                 loss_dict = criterion(data_dict)
                 loss = loss_dict['loss']

diff --git a/utils/builder.py b/utils/builder.py
@@ -479,7 +479,7 @@ def get_inference_depth_norm(self, inference_params = None):
         Parameters
         ----------
 
-        inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth range. Otherwise, the inference parameters in the self.params will be used to get the inference depth range.
+        inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth normalization coefficient. Otherwise, the inference parameters in the self.params will be used to get the inference depth normalization coefficient.
 
         Returns
         -------
-Original file line number
+Diff line change
@@ Expand Up @@
             Parameters
             ----------
-            inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth range. Otherwise, the inference parameters in the self.params will be used to get the inference depth range.
+            inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth normalization coefficient. Otherwise, the inference parameters in the self.params will be used to get the inference depth normalization coefficient.
             Returns
             -------
@@ Expand Down @@