A sim2real panoptic segmentation model for driving scene understanding.
To achieve this objective, a panoptic segmentation dataset composed of synthetic and real driving scene images has been developed.
The former has been obtained by means of simulation on the CARLA simulator. Specifically, it has been generated by means of the open-source tool that i have developed and is available at the following github repository.
It allows for the creation of a multi-camera view setup with semantic and instance segmentation ground truth that can be used for a wide variety of tasks, including the simpler object detection. For more details, visit that repository.
The real world components have been integrated by means of the renowned Cityscapes and BDD100k dataset.
To merge the three into a common sim2real dataset, a common subset of labels relevant for driving scene understanding has been determined and utilized to re-map the ground-truths of each dataset to a common format.
The sim2real setting proposed in this work considers the training of a panoptic segmentation model that exploits only synthetic data to learn the way to accomplish its main task, while the real world data is only exploited by a self-supervised task.
Nevertheless, this multi-task setting allows the model to transfer its panoptic segmentation capability across the domains, improving upon models solely trained on synthetic data and tested on real driving scenes.
As such, only synthetic data annotations have been used to train the model, while real data has been employed only as a means to accomplish a self supervised domain classification task and to evaluate model performance on the task.
Finally, the model architecture is optimized in order to develop a configuration that maximizes the transferred panoptic segmentation performance from synthetic to the real world. Further details are provided at the link of the publication https://webthesis.biblio.polito.it/22581/1/tesi.pdf
All three datasets have to be stored in a common DATA_ROOT folder, which can be configured by modifying the script set_env.sh with the appropriate path. Then, to set the environment variable for the current shell, simply run:
source set_env.sh
The DATA_ROOT folder must contain three main folder named:
- cityscapes
- bdd
- coco_carla
Download the following cityscapes subsets from the official (cityscapes website)[https://www.cityscapes-dataset.com/downloads/]:
- gtFine_trainvaltest.zip
- leftImg8bit_trainvaltest.zip Each must be put within the cityscapes folder as described above and unzipped there.
Download the following bdd10k subsets from the official (BDD100K website)[https://bdd-data.berkeley.edu/portal.html#download]:
- panoptic segmentation annotations
- bdd10k images (note that it is the 10k subset, not the 100k one)
Each must be put within the bdd folder as described above and unzipped there.
Download from google cloud bucket with the following command (note, DATA_ROOT is an environment variable initialized with the execution of set_env.sh, which you have to modify according to your preferred data storage path):
gsutil -m cp -r \
"gs://sim-carla/categories_stuff_panfpn.json" \
"gs://sim-carla/train_annotations" \
"gs://sim-carla/train_images" \
"gs://sim-carla/train_panoptic" \
"gs://sim-carla/train_sem_stuff" \
"gs://sim-carla/val_annotations" \
"gs://sim-carla/val_images" \
"gs://sim-carla/val_panoptic" \
"gs://sim-carla/val_sem_stuff" \
${DATA_ROOT}/coco-carla
Alternatively, generate your own version, by means of the tool that i created to generate this synthetic dataset. It is available as an open source project at the following github repository
To this end, the (COCO format)[https://cocodataset.org/#format-results] serves as the base protocol to gather and store data uniformly across each subset. Hence, each dataset has been mapped to a common format following the same data pipeline.
The common steps are as follows:
- (dataset-specific labels -> common label set) map each dataset-specific category label id to the common category label set id
- (common label set -> COCO-detection) map instance and semantic segmentation masks to the COCO-detection format (for instance masks)
- (COCO-detection -> COCO-panoptic) instance masks and semantic segmentation masks are mapped to panoptic segmentation masks according to the COCO-panoptic format
- (COCO-panoptic -> Detectron2 panopticFPN) COCO-Panoptic masks are mapped to the Panoptic Segmentation format required by the (PanopticFPN model)[https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html#standard-dataset-dicts] (read the note about the panopticfpn architecture at the linked page )
The training and hyper-parameter optimization can both be carried out from the provided jupyter notebook. It provides a simple interface to track experiments that have already been executed as well as new ones designed by changing the list of parameters of the grid-search dictionary.
After training or hyper-parameter optimization has been carried out, all metrics can be visualized by running from the top level folder of this project:
tensorboard --logdir .