Content creation beyond visual outputs: We present an image-to-image method to synthesize the visual appearance and tactile geometry of different materials, given a handcrafted or DALL⋅E 2 sketch. We then render the outputs on a surface haptic device like TanvasTouch® where users can slide on the screen to feel the rendered textures. (Turn the audio ON to hear the sound of the rendering.)
website_teaser_video.mp4
Controllable Visual-Tactile Synthesis
Ruihan Gao, Wenzhen Yuan, Jun-Yan Zhu
Carnegie Mellon University
ICCV, 2023
We show an example of our visual-tactile synthesis. The tactile output is shown in the 3D height map. The patches below correspond to bounding boxes shown in the sketch input. Please see our paper for more results.
We can also render the synthesized results as a colored 3D mesh. The meshes are exaggerated in z direction to show fine textures.
Please see our website and paper for more interactive and comprehensive results
We are plan to release our code and dataset in the following steps:
- Inference and Evaluation code [05/04].
- Preprocessed data of all 20 garments in our TouchClothing dataset [05/04].
- Pretrained model (ours & baselines) on the TouchClothing dataset [05/04].
- Training code [05/04].
- Data preprocessing code for camera and GelSight R1.5 data.
- Rendering code to generate friction maps for TanvasTouch.
- Instructions on how to create new test data.
We tested our code with Python 3.8 and Pytorch 1.11.0. (We recommend installing PyTorch separately to avoid package conflicts.)
git clone https://github.com/RuihanGao/visual-tactile-synthesis.git
cd visual-tactile-synthesis
conda create -n VTS python=3.8
conda activate VTS
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
We provide the preprocessed data for our TouchClothing dataset, which contains 20 pieces of garments of various shapes and textures. Here are 20 objects in TouchClothing dataset:
Example of preprocessed data:
website_dataset_video.mp4
Use the following commands to download and unzip the dataset.
(0) Install gdown
and unzip
as follows if you haven't done so.
pip install gdown
sudo apt install unzip
(1) Download the preprocessed data from Google Drive via the following command:
Total size: 580M.
bash scripts/download_TouchClothing_dataset.sh
(2) Put the unzipped folder datasets
in the code repo.
Note:
- in case there is "access denied" error, try
pip install -U --no-cache-dir gdown --pre
and rungdown
command again. Ref here - use
-q
flag tounzip
to suppress the log as it could be quite long
website_method_video_v3_voiceover.mp4
We provide the pretrained models for our method and several baselines included in our paper. For each method, we provide 20 models, one for each object in our TouchClothing dataset.
See the Google Drive folder here. To use them,
(1) download the checkpoints
- checkpoints for our method (124M):
gdown "https://drive.google.com/uc?export=download&id=11y2jP2vT7CtBIaEDcjROZ5hupHsYWG8D"
- checkpoints for baselines (21.5G):
gdown "https://drive.google.com/uc?export=download&id=16NNU1GuOWWtarzEJkLSYbeSqQVaX-943"
(2) After unzipping the files, put all pre-trained models in the folder checkpoints
to load them properly in the testing code.
(3) See the testing section for more examples of how to evaluate the pretrained models.
In general, our pipeline contains two steps. We first feed the sketch input to our model to synthesize synchronized visual and tactile output. Then we convert the tactile output to a friction map required by TanvasTouch and render the multi-modal output on the surface haptic device, where you can see and feel the object simultaneously.
material=BlackJeans
CUDA_VISIBLE_DEVICES=0 python train.py --gpu_ids 0 --name "${material}_sinskitG_baseline_ours" --model sinskitG --dataroot ./datasets/"singleskit_${material}_padded_1800_x1/"
where you can choose the variable material
from our TouchClothing dataset or your own customized dataset.
To use our launcher scripts to run multiple experiments in tmux window, use the following command:
(Ref here for more examples and explanations for tmux launcher)
material_idx=0
python -m experiments SingleG_AllMaterials_baseline_ours launch $material_idx
where the material_idx set which object in the dataset to use. Choose a material_idx or use 'all' to run multiple experiments at once.
The list of the material can be found in the launcher file experiments/SingleG_AllMaterials_baseline_ours_launcher.py
Note: Loading the dataset to cache before training may take up to 20-30 mins and the training takes about 16h on a single A5000 GPU. Please be patient.
For a proof-of-concept training, set data_len
in SingleG_AllMaterials_baseline_ours_launcher
and verbose_freq
in models/sinskitG_model.py
to a smaller number (e.g., 3 or 10).
material=BlackJeans
CUDA_VISIBLE_DEVICES=0 python test.py --gpu_ids 0 --name "${material}_sinskitG_baseline_ours" --model sinskitG --dataroot ./datasets/"singleskit_${material}_padded_1800_x1/" --epoch best --eval
Or, if you are using tmux_launcher
, use the following command.
material_idx=0
python -m experiments SingleG_AllMaterials_baseline_ours test $material_idx
The results will be stored in the results
directory.
To compile the quantitative metrics of the tested method in a tabulated format, run bash scripts/compile_eval_metrics_sinskitG.sh
. For each method, it retrieves the eval_metrics.pkl
file of all materials and take the average. Modify materials
list in util/compile_eval_metrics_sinskitG.py
and the bash script accordingly.
@inproceedings{gao2023controllable,
title={Controllable Visual-Tactile Synthesis},
author={Gao, Ruihan and Yuan, Wenzhen and Zhu, Jun-Yan},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
year={2023},
}
We thank Sheng-Yu Wang, Kangle Deng, Muyang Li, Aniruddha Mahapatra, and Daohan Lu for proofreading the draft. We are also grateful to Sheng-Yu Wang, Nupur Kumari, Gaurav Parmar, George Cazenavette, and Arpit Agrawal for their helpful comments and discussion. Additionally, we thank Yichen Li, Xiaofeng Guo, and Fujun Ruan for their help with the hardware setup. Ruihan Gao is supported by A*STAR National Science Scholarship (Ph.D.).
Our code base is built upon Contrastive Unpaired Translation (CUT).