Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,29 @@
# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and Grasping
# TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and A Grasping Baseline

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/transcg-a-large-scale-real-world-dataset-for/transparent-object-depth-estimation-on)](https://paperswithcode.com/sota/transparent-object-depth-estimation-on?p=transcg-a-large-scale-real-world-dataset-for) [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]
[![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]

[[Paper]](https://arxiv.org/pdf/2202.08471) [[Project Page]](https://graspnet.net/transcg)
[[Paper (IEEE Xplore)]](https://ieeexplore.ieee.org/document/9796631) [[Paper (arXiv)]](https://arxiv.org/pdf/2202.08471) [[Project Page]](https://graspnet.net/transcg)

**Authors**: [Hongjie Fang](https://github.com/galaxies99/), [Hao-Shu Fang](https://github.com/fang-haoshu), [Sheng Xu](https://github.com/XS1020), [Cewu Lu](https://mvig.sjtu.edu.cn/).

Welcome to the official repository for the TransCG paper. This repository includes the dataset and the proposed Depth Filler Net (DFNet) models.

## News

2022-10-14: The correct checkpoint has been updated. Check the [Quick Start](#quick-start) section for details. Many thanks to [@ZhiyangZhou24](https://github.com/ZhiyangZhou24) for reporting the issue.

2022-10-10: New checkpoint and source code are released. Check the [Quick Start](#quick-start) section for details. This checkpoint and source code fixes the shifting problem to a large extent (now only ~2cm shifting, which can be solved further using many engineering methods), and use interpolation to solve the empty hole problem. Many thanks to [@haojieh](https://github.com/haojieh) and [@mtbui2010](https://github.com/mtbui2010) for mentioning it. The new checkpoint has improved several metrics, see details in [assets/docs/DFNet.md](assets/docs/DFNet.md).

2022-10-02: For checkpoint and source code that correspond to the paper, please see [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of our repository. Shifting problems in this version may be solved by calculating the difference of the average depth before and after refining, and then subtract the difference from the refining depths.

2022-09-16: New version of DFNet code is released. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs and [@haberger](https://github.com/haberger) for mentioning it.

2022-06-15: Our TransCG paper is published in IEEE Robotics and Automation Letters Vol. 7, No. 3, 2022, and is available at [IEEE Xplore](https://ieeexplore.ieee.org/document/9796631).

2022-06-01: Our TransCG paper is accepted by RA-L.

2022-02-17: Our paper is released on [arXiv](https://arxiv.org/pdf/2202.08471), and submitted to IEEE Robotics and Automation Letters (RA-L).

## TransCG Dataset

<img align="right" src="assets/imgs/TransCG.gif" width=240px> TransCG dataset is now available on [official page](https://graspnet.net/transcg). TransCG dataset is the first large-scale real-world dataset for transparent object depth completion and grasping. In total, our dataset contains 57,715 RGB-D images of 51 transparent objects and many opaque objects captured from different perspectives of 130 scenes under various real-world settings. The 3D mesh model of the transparent objects are also provided in our dataset.
Expand Down Expand Up @@ -41,9 +57,7 @@ pip install -r requirements.txt

### Quick Start

**NOTE.** The following checkpoint is compatible with [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76). We will update the checkpoint of the latest version later.

Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network.
Our pretrained checkpoint is available on [Google Drive](https://drive.google.com/file/d/1oZi9zdOg0WYuTHM10xlyq5FRlfoKDKzU/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/1G9OaZ1Kk-KmHWOUHARsgNQ) (Code: bpes). The checkpoint is trained with the default configuration in the `configs` folder. You can use our released checkpoints for [inference](#inference) or [testing](#testing-optional). Refer to [assets/docs/DFNet.md](assets/docs/DFNet.md) for details about the depth completion network.

### Grasping Demo

Expand Down
16 changes: 12 additions & 4 deletions assets/docs/DFNet.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,23 @@ The architecture of our proposed end-to-end depth completion network DFNet is sh

## Experiments

<img align="middle" src='../imgs/exper1.png' width=720px>
| Method | RMSE | REL | MAE | Delta 1.05 | Delta 1.10 | Delta 1.25 | GPU Mem. Occ. | Infer. Time | Model Size |
| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| ClearGrasp | 0.054 | 0.083 | 0.037 | 50.48 | 68.68 | 95.28 | 2.1 GB | 2.2813s | 934 MB |
| LIDF-Refine | 0.019 | 0.034 | 0.015 | 78.22 | 94.26 | 99.80 | 6.2 GB | 0.0182s | 251 MB |
| TranspareNet* | 0.026 | **0.023** | **0.013** | **88.45** | 96.25 | 99.42 | 1.9 GB | 0.0354s | 336 MB |
| DFNet** | **0.018** | 0.026 | **0.013** | 84.94 | **96.57** | **99.85** | **1.6 GB** | **0.0166s** | **5.2 MB** |

Experiments demonstrate superior efficacy, efficiency and robustness of our method over previous works, and it is able to process images of high resolutions under limited hardware resources, as shown in the following figures.
Here, ClearGrasp refers to [1], LIDF-Refine refers to [2], TranspareNet refers to [3], and DFNet refers to our proposed Depth Filler Net.

<img align="middle" src='../imgs/exper2.png' width=460px>
*: TranspareNet [3] is a concurrent work with our project.

**Note**. In experiment tables above, ClearGrasp (or [34]) refers to "ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation" (ICRA 2020), and LIDF-Refine (or [41]) refers to "RGB-D Local Implicit Function for Depth Completion of Transparent Objects" (CVPR 2021).
**: Here, we use the newly released checkpoint of DFNet, which is slightly different from the checkpoint used in the paper. The newly released checkpoint fixes the bugs of point cloud shifting mentioned in [Issue #4](https://github.com/Galaxies99/TransCG/issues/4) and the black-hole problem mentioned in [Issue #7](https://github.com/Galaxies99/TransCG/issues/7).

For original checkpoint that is used in the paper, please use [this version](https://github.com/Galaxies99/TransCG/tree/f80708ac4243e9f9d3f5a7b11afd863b21506f76) of the repository, and see [Google Drive](https://drive.google.com/file/d/1APIuzIQmFucDP4RcmiNV-NEsQKqN9J57/view?usp=sharing) or [Baidu Netdisk](https://pan.baidu.com/s/14khejj63OjOKsyzxnuYo5Q) (Code: c01g) for downloading it. Many thanks to [@cxt98](https://github.com/cxt98) for fixing the bugs in [Issue #5](https://github.com/Galaxies99/TransCG/issues/5).

## References

1. Sajjan, Shreeyak, et al. "Clear grasp: 3d shape estimation of transparent objects for manipulation." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
2. Zhu, Luyang, et al. "RGB-D Local Implicit Function for Depth Completion of Transparent Objects." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
3. Xu, Haoping, et al. "Seeing Glass: Joint Point-Cloud and Depth Completion for Transparent Objects." 5th Annual Conference on Robot Learning. 2021.
2 changes: 0 additions & 2 deletions assets/docs/grasping.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ Given an RGB image along with a depth image collected by an RGB-D camera, we fir

## Experiments

<img align="middle" src='../imgs/exper3.png' width=460px>

## Reference

1. Fang, Hao-Shu, et al. "Graspnet-1billion: A large-scale benchmark for general object grasping." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
Binary file removed assets/imgs/exper1.png
Binary file not shown.
Binary file removed assets/imgs/exper2.png
Binary file not shown.
Binary file removed assets/imgs/exper3.png
Binary file not shown.
6 changes: 4 additions & 2 deletions configs/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,18 @@
"use_augmentation": True
"rgb_augmentation_probability": 0.8
"depth_min": 0.3
"depth_max": 1.5
"depth_max": 1.0
"depth_norm": 1.0
"with_original": True
"test":
"type": "transcg"
"data_dir": "data"
"image_size": !!python/tuple [320, 240]
"use_augmentation": False
"depth_min": 0.3
"depth_max": 1.5
"depth_max": 1.0
"depth_norm": 1.0
"with_original": True

"dataloader":
"num_workers": 48
Expand Down
2 changes: 1 addition & 1 deletion configs/inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@
"image_size": !!python/tuple [320, 240]
"cuda_id": 0
"depth_min": 0.3
"depth_max": 1.5
"depth_max": 1.0
"depth_norm": 1.0
14 changes: 10 additions & 4 deletions datasets/transcg.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,15 @@ def __init__(self, data_dir, split = 'train', **kwargs):
self.sample_info.append([
os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)),
1, # (for D435)
scene_type
scene_type,
perspective_id
])
for perspective_id in self.scene_metadata[scene_id]['L515_valid_perspective_list']:
self.sample_info.append([
os.path.join(self.data_dir, 'scene{}'.format(scene_id), '{}'.format(perspective_id)),
2, # (for L515)
scene_type
scene_type,
perspective_id
])
# Integrity double-check
assert len(self.sample_info) == self.total_samples, "Error in total samples, expect {} samples, found {} samples.".format(self.total_samples, len(self.sample_info))
Expand All @@ -72,12 +74,16 @@ def __init__(self, data_dir, split = 'train', **kwargs):
self.with_original = kwargs.get('with_original', False)

def __getitem__(self, id):
img_path, camera_type, scene_type = self.sample_info[id]
img_path, camera_type, scene_type, perspective_id = self.sample_info[id]
rgb = np.array(Image.open(os.path.join(img_path, 'rgb{}.png'.format(camera_type))), dtype = np.float32)
depth = np.array(Image.open(os.path.join(img_path, 'depth{}.png'.format(camera_type))), dtype = np.float32)
depth_gt = np.array(Image.open(os.path.join(img_path, 'depth{}-gt.png'.format(camera_type))), dtype = np.float32)
depth_gt_mask = np.array(Image.open(os.path.join(img_path, 'depth{}-gt-mask.png'.format(camera_type))), dtype = np.uint8)
return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, with_original = self.with_original)
if camera_type == 1:
depth_coeff = (perspective_id // 20) / 12 + 1
else:
depth_coeff = 1
return process_data(rgb, depth, depth_gt, depth_gt_mask, self.cam_intrinsics[camera_type], scene_type = scene_type, camera_type = camera_type, split = self.split, image_size = self.image_size, depth_min = self.depth_min, depth_max = self.depth_max, depth_norm = self.depth_norm, use_aug = self.use_aug, rgb_aug_prob = self.rgb_aug_prob, depth_coeff = depth_coeff, inpainting = True, with_original = self.with_original)

def __len__(self):
return self.total_samples
29 changes: 26 additions & 3 deletions inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from utils.logger import ColoredLogger
from utils.builder import ConfigBuilder
from time import perf_counter
from scipy.interpolate import NearestNDInterpolator


class Inferencer(object):
Expand Down Expand Up @@ -68,7 +69,7 @@ def __init__(self, cfg_path = os.path.join('configs', 'inference.yaml'), with_in
self.depth_min, self.depth_max = self.builder.get_inference_depth_min_max()
self.depth_norm = self.builder.get_inference_depth_norm()

def inference(self, rgb, depth, target_size = (1280, 720)):
def inference(self, rgb, depth, target_size = (1280, 720), depth_coefficient = 10.0, inpainting = True):
"""
Inference.

Expand All @@ -77,7 +78,11 @@ def inference(self, rgb, depth, target_size = (1280, 720)):

rgb, depth: the initial RGB-D image;

target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size.
target_size: tuple of (int, int), optional, default: (1280, 720), the target depth image size;

depth_coefficient: float, optional, default: 10.0, only regard [depth_mu - depth_coefficient * depth_std, depth_mu + depth_coefficient * depth_std] as the valid pixels;

inpainting: bool, default: True, whether to inpaint the invalid pixels.

Returns
-------
Expand All @@ -90,7 +95,20 @@ def inference(self, rgb, depth, target_size = (1280, 720)):
depth = np.where(depth < self.depth_min, 0, depth)
depth = np.where(depth > self.depth_max, 0, depth)
depth[np.isnan(depth)] = 0
depth_available = depth[depth > 0]
depth_mu = depth_available.mean() if depth_available.shape[0] != 0 else 0
depth_std = depth_available.std() if depth_available.shape[0] != 0 else 1
depth = np.where(depth < depth_mu - depth_coefficient * depth_std, 0, depth)
depth = np.where(depth > depth_mu + depth_coefficient * depth_std, 0, depth)
if inpainting:
mask = np.where(depth > 0)
if mask[0].shape[0] != 0:
interp = NearestNDInterpolator(np.transpose(mask), depth[mask])
depth = interp(*np.indices(depth.shape))
depth = depth / self.depth_norm
depth_min = depth.min() - 0.5 * depth.std() - 1e-6
depth_max = depth.max() + 0.5 * depth.std() + 1e-6
depth = (depth - depth_min) / (depth_max - depth_min)
rgb = (rgb / 255.0).transpose(2, 0, 1)
rgb = torch.FloatTensor(rgb).to(self.device).unsqueeze(0)
depth = torch.FloatTensor(depth).to(self.device).unsqueeze(0)
Expand All @@ -101,7 +119,12 @@ def inference(self, rgb, depth, target_size = (1280, 720)):
if self.with_info:
self.logger.info("Inference finished, time: {:.4f}s.".format(time_end - time_start))
depth_res = depth_res.squeeze(0).cpu().detach().numpy()
depth_ori = depth.squeeze(0).cpu().detach().numpy()
depth_res = depth_res * (depth_max - depth_min) + depth_min
depth_ori = depth_ori * (depth_max - depth_min) + depth_min
depth_res = depth_res * self.depth_norm
depth_ori = depth_ori * self.depth_norm
depth_res = cv2.resize(depth_res, target_size, interpolation = cv2.INTER_NEAREST)
return depth_res
depth_ori = cv2.resize(depth_ori, target_size, interpolation = cv2.INTER_NEAREST)
return depth_res, depth_ori

3 changes: 1 addition & 2 deletions models/DFNet.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,7 @@ def __init__(self, in_channels = 4, hidden_channels = 64, L = 5, k = 12, use_DUC
nn.Conv2d(self.hidden_channels, self.hidden_channels, kernel_size = 3, stride = 1, padding = 1),
nn.BatchNorm2d(self.hidden_channels),
nn.ReLU(True),
nn.Conv2d(self.hidden_channels, 1, kernel_size = 3, stride = 1, padding = 1),
nn.ReLU(True)
nn.Conv2d(self.hidden_channels, 1, kernel_size = 1, stride = 1)
)

def _make_upconv(self, in_channels, out_channels, upscale_factor = 2):
Expand Down
12 changes: 6 additions & 6 deletions sample_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,19 @@ def draw_point_cloud(color, depth, camera_intrinsics, use_mask = False, use_inpa

inferencer = Inferencer()

rgb = np.array(Image.open('data/scene1/1/rgb1.png'), dtype = np.float32)
depth = np.array(Image.open('data/scene1/1/depth1.png'), dtype = np.float32)
depth_gt = np.array(Image.open('data/scene1/1/depth1-gt.png'), dtype = np.float32)
rgb = np.array(Image.open('data/scene21/1/rgb1.png'), dtype = np.float32)
depth = np.array(Image.open('data/scene21/1/depth1.png'), dtype = np.float32)
depth_gt = np.array(Image.open('data/scene21/1/depth1-gt.png'), dtype = np.float32)

depth = depth / 1000
depth_gt = depth_gt / 1000

res = inferencer.inference(rgb, depth)
res, depth = inferencer.inference(rgb, depth, depth_coefficient = 3, inpainting = True)

cam_intrinsics = np.load('data/camera_intrinsics/1-camIntrinsics-D435.npy')

res = np.clip(res, 0.1, 1.5)
depth = np.clip(depth, 0.1, 1.5)
res = np.clip(res, 0.3, 1.0)
depth = np.clip(depth, 0.3, 1.0)

cloud = draw_point_cloud(rgb, res, cam_intrinsics, scale = 1.0)
cloud_gt = draw_point_cloud(rgb, depth_gt, cam_intrinsics, scale = 1.0)
Expand Down
2 changes: 2 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ def test():
time_start = perf_counter()
res = model(data_dict['rgb'], data_dict['depth'])
time_end = perf_counter()
depth_scale = data_dict['depth_max'] - data_dict['depth_min']
res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
data_dict['pred'] = res
_ = metrics.evaluate_batch(data_dict, record = True)
duration = time_end - time_start
Expand Down
6 changes: 5 additions & 1 deletion train.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,9 @@ def train_one_epoch(epoch):
for data_dict in pbar:
optimizer.zero_grad()
data_dict = to_device(data_dict, device)
res = model(data_dict['rgb'], data_dict['depth'])
res = model(data_dict['rgb'], data_dict['depth'])
depth_scale = data_dict['depth_max'] - data_dict['depth_min']
res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
data_dict['pred'] = res
loss_dict = criterion(data_dict)
loss = loss_dict['loss']
Expand Down Expand Up @@ -118,6 +120,8 @@ def test_one_epoch(epoch):
time_start = perf_counter()
res = model(data_dict['rgb'], data_dict['depth'])
time_end = perf_counter()
depth_scale = data_dict['depth_max'] - data_dict['depth_min']
res = res * depth_scale.reshape(-1, 1, 1) + data_dict['depth_min'].reshape(-1, 1, 1)
data_dict['pred'] = res
loss_dict = criterion(data_dict)
loss = loss_dict['loss']
Expand Down
2 changes: 1 addition & 1 deletion utils/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ def get_inference_depth_norm(self, inference_params = None):
Parameters
----------

inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth range. Otherwise, the inference parameters in the self.params will be used to get the inference depth range.
inference_params: dict, optional, default: None. If inference_params is provided, then use the parameters specified in the inference_params to get the inference depth normalization coefficient. Otherwise, the inference parameters in the self.params will be used to get the inference depth normalization coefficient.

Returns
-------
Expand Down
Loading