nutonomy · holger-motional · Oct 26, 2020 · Oct 19, 2020 · Oct 19, 2020 · Oct 19, 2020
diff --git a/python-sdk/nuscenes/eval/detection/README.md b/python-sdk/nuscenes/eval/detection/README.md
@@ -9,6 +9,7 @@
 - [Results format](#results-format)
 - [Classes and attributes](#classes-attributes-and-detection-ranges)
 - [Evaluation metrics](#evaluation-metrics)
+- [Leaderboard](#leaderboard)
 
 ## Introduction
 Here we define the 3D object detection task on nuScenes.

diff --git a/python-sdk/nuscenes/eval/lidarseg/README.md b/python-sdk/nuscenes/eval/lidarseg/README.md
@@ -1,5 +1,5 @@
 # nuScenes lidar segmentation task
-![nuScenes lidar segementation logo](https://www.nuscenes.org/public/images/tasks.png)
+![nuScenes lidar segmentation logo](https://www.nuscenes.org/public/images/lidarseg_challenge.jpg)
 
 ## Overview
 - [Introduction](#introduction)
@@ -9,6 +9,7 @@
 - [Results format](#results-format)
 - [Classes](#classes)
 - [Evaluation metrics](#evaluation-metrics)
+- [Leaderboard](#leaderboard)
 
 ## Introduction
 Here we define the lidar segmentation task on nuScenes.
@@ -27,7 +28,7 @@ Additionally we organize a number of challenges at leading Computer Vision confe
 Users that submit their results during the challenge period are eligible for awards.
 Any user that cannot attend the workshop (direct or via a representative) will be excluded from the challenge, but will still be listed on the leaderboard.
 
-Click [here](http://evalai.cloudcv.org/web/challenges/challenge-page/) for the **EvalAI lidar segementation evaluation server** (coming soon).
+Click [here](http://evalai.cloudcv.org/web/challenges/challenge-page/) for the **EvalAI lidar segmentation evaluation server** (coming soon).
 
 ### 5th AI Driving Olympics, NeurIPS 2020
 The first nuScenes lidar segmentation challenge will be held at NeurIPS 2020.
@@ -61,15 +62,15 @@ The folder structure of the results should be as follows:
 ```
 └── results_folder
     ├── lidarseg
-    │   └── v1.0-test <- Contains the .bin files; a .bin file 
-    │                    contains the labels of the points in a 
-    │                    point cloud         
-    └── v1.0-test
+    │   └── {test, train, val} <- Contains the .bin files; a .bin file 
+    │                             contains the labels of the points in a 
+    │                             point cloud         
+    └── {test, train, val}
         └── submission.json  <- contains certain information about 
                                 the submission
 ```
 
-The contents of the `submision.json` file and `v1.0-test` folder are defined below: 
+The contents of the `submision.json` file and `test` folder are defined below: 
 * The `submission.json` file includes meta data `meta` on the type of inputs used for this method.
   ```
   "meta": {
@@ -80,17 +81,25 @@ The contents of the `submision.json` file and `v1.0-test` folder are defined bel
       "use_external": <bool>          -- Whether this submission uses external data as an input.
   },
   ```
-* The `v1.0-test` folder contains .bin files, where each .bin file contains the labels of the points for the point cloud.
+* The `test` folder contains .bin files, where each .bin file contains the labels of the points for the point cloud.
   Pay special attention that each set of predictions in the folder must be a .bin file and named as **<lidar_sample_data_token>_lidarseg.bin**.
-  A .bin file contains an array of `int` in which each value is the predicted [class index](#classes) of the corresponding point in the point cloud, e.g.:
+  A .bin file contains an array of `uint8` values in which each value is the predicted [class index](#classes) of the corresponding point in the point cloud, e.g.:
   ```
   [1, 5, 4, 1, ...]
   ```
-  Each `lidar_sample_data_token` from the current evaluation set must be included in the `v1.0-test` folder.
+  Below is an example of how to save the predictions for a single point cloud:
+  ```
+  bin_file_path = lidar_sample_data_token + '_lidarseg.bin"
+  np.array(predicted_labels).astype(np.uint8).tofile(bin_file_path)
+  ```
+  Note that the arrays should **not** contain the `ignore` class (i.e. class index 0). 
+  Each `lidar_sample_data_token` from the current evaluation set must be included in the `test` folder.
 
 For the train and val sets, the evaluation can be performed by the user on their local machine.
 For the test set, the user needs to zip the results folder and submit it to the official evaluation server.
 
+For convenience, a `validate_submission.py` script has been provided to check that a given results folder is of the correct format.
+
 Note that the lidar segmentation classes differ from the general nuScenes classes, as detailed below.
 
 ## Classes
@@ -139,16 +148,22 @@ For more information on the classes and their frequencies, see [this page](https
 
 ## Evaluation metrics
 Below we define the metrics for the nuScenes lidar segmentation task.
-Our final score is a weighted sum of mean intersection-over-union (mIOU).
+The challenge winners and leaderboard ranking will be determined by the mean intersection-over-union (mIOU) score.
 
 ### Preprocessing
 Contrary to the [nuScenes detection task](https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/eval/detection/README.md), 
 we do not perform any preprocessing, such as removing GT / predictions if they exceed the class-specific detection range
-or if they full inside a bike-rack.
+or if they fall inside a bike-rack.
 
 ### Mean IOU (mIOU)
 We use the well-known IOU metric, which is defined as TP / (TP + FP + FN). 
 The IOU score is calculated separately for each class, and then the mean is computed across classes.
+Note that lidar segmentation index 0 is ignored in the calculation.
+
+### Frequency-weighted IOU (FWIOU)
+Instead of taking the mean of the IOUs across all the classes, each IOU is weighted by the point-level frequency of its class.
+Note that lidar segmentation index 0 is ignored in the calculation.
+FWIOU is not used for the challenge.
 
 ## Leaderboard
 nuScenes will maintain a single leaderboard for the lidar segmentation task.
@@ -161,12 +176,14 @@ Furthermore, there will also be an award for novel ideas, as well as the best st
 
 **Lidar track**: 
 * Only lidar input allowed.
+* Only lidar segmentation annotations from nuScenes-lidarseg are allowed.
 * External data or map data <u>not allowed</u>.
 * May use pre-training.
-  
+
 **Open track**: 
 * Any sensor input allowed.
-* External data and map data allowed.  
+* All nuScenes, nuScenes-lidarseg and nuImages annotations are allowed.
+* External data and map data allowed.
 * May use pre-training.
 
 **Details**:

diff --git a/python-sdk/nuscenes/eval/lidarseg/evaluate.py b/python-sdk/nuscenes/eval/lidarseg/evaluate.py
@@ -0,0 +1,158 @@
+import argparse
+import json
+import os
+from typing import Dict
+
+import numpy as np
+from tqdm import tqdm
+
+from nuscenes import NuScenes
+from nuscenes.eval.lidarseg.utils import LidarsegClassMapper, ConfusionMatrix, get_samples_in_eval_set
+
+
+class LidarSegEval:
+    """
+    This is the official nuScenes-lidarseg evaluation code.
+    Results are written to the provided output_dir.
+
+    nuScenes-lidarseg uses the following metrics:
+    - Mean Intersection-over-Union (mIOU): We use the well-known IOU metric, which is defined as TP / (TP + FP + FN).
+                                           The IOU score is calculated separately for each class, and then the mean is
+                                           computed across classes. Note that in the challenge, index 0 is ignored in
+                                           the calculation.
+    - Frequency-weighted IOU (FWIOU): Instead of taking the mean of the IOUs across all the classes, each IOU is
+                                      weighted by the point-level frequency of its class. Note that in the challenge,
+                                      index 0 is ignored in the calculation. FWIOU is not used for the challenge.
+
+    We assume that:
+    - For each pointcloud, the prediction for every point is present in a .bin file, in the same order as that of the
+      points stored in the corresponding .bin file.
+    - The naming convention of the .bin files containing the predictions for a single point cloud is:
+        <lidar_sample_data_token>_lidarseg.bin
+    - The predictions are between 1 and 16 (inclusive); 0 is the index of the ignored class.
+
+    Please see https://www.nuscenes.org/lidar-segmentation for more details.
+    """
+    def __init__(self,
+                 nusc: NuScenes,
+                 results_folder: str,
+                 eval_set: str,
+                 verbose: bool = False):
+        """
+        Initialize a LidarSegEval object.
+        :param nusc: A NuScenes object.
+        :param results_folder: Path to the folder.
+        :param eval_set: The dataset split to evaluate on, e.g. train, val or test.
+        :param verbose: Whether to print messages during the evaluation.
+        """
+        # Check there are ground truth annotations.
+        assert len(nusc.lidarseg) > 0, 'Error: No ground truth annotations found in {}.'.format(nusc.version)
+
+        # Check results folder exists.
+        self.results_folder = results_folder
+        self.results_bin_folder = os.path.join(results_folder, 'lidarseg', eval_set)
+        assert os.path.exists(self.results_bin_folder), \
+            'Error: The folder containing the .bin files ({}) does not exist.'.format(self.results_bin_folder)
+
+        self.nusc = nusc
+        self.results_folder = results_folder
+        self.eval_set = eval_set
+        self.verbose = verbose
+
+        self.mapper = LidarsegClassMapper(nusc_)
+        self.ignore_idx = self.mapper.ignore_class['index']
+        self.id2name = {idx: name for name, idx in self.mapper.coarse_name_2_coarse_idx_mapping.items()}
+        self.num_classes = len(self.mapper.coarse_name_2_coarse_idx_mapping)
+
+        if self.verbose:
+            print('There are {} classes.'.format(self.num_classes))
+
+        self.global_cm = ConfusionMatrix(self.num_classes, self.ignore_idx)
+
+        self.sample_tokens = get_samples_in_eval_set(self.nusc, self.eval_set)
+        if self.verbose:
+            print('There are {} samples.'.format(len(self.sample_tokens)))
+
+    def evaluate(self) -> Dict:
+        """
+        Performs the actual evaluation.
+        :return: A dictionary containing the evaluated metrics.
+        """
+        for sample_token in tqdm(self.sample_tokens, disable=not self.verbose):
+            sample = self.nusc.get('sample', sample_token)
+
+            # Get the sample data token of the point cloud.
+            sd_token = sample['data']['LIDAR_TOP']
+
+            # Load the ground truth labels for the point cloud.
+            lidarseg_label_filename = os.path.join(self.nusc.dataroot,
+                                                   self.nusc.get('lidarseg', sd_token)['filename'])
+            lidarseg_label = self.load_bin_file(lidarseg_label_filename)
+
+            lidarseg_label = self.mapper.convert_label(lidarseg_label)
+
+            # Load the predictions for the point cloud.
+            lidarseg_pred_filename = os.path.join(self.results_folder, 'lidarseg',
+                                                  self.eval_set, sd_token + '_lidarseg.bin')
+            lidarseg_pred = self.load_bin_file(lidarseg_pred_filename)
+
+            # Get the confusion matrix between the ground truth and predictions.
+            # Update the confusion matrix for the sample data into the confusion matrix for the eval set.
+            self.global_cm.update(lidarseg_label, lidarseg_pred)
+
+        iou_per_class = self.global_cm.get_per_class_iou()
+        miou = self.global_cm.get_mean_iou()
+        freqweighted_iou = self.global_cm.get_freqweighted_iou()
+
+        # Put everything nicely into a dict.
+        results = {'iou_per_class': {self.id2name[i]: class_iou for i, class_iou in enumerate(iou_per_class)},
+                   'miou': miou,
+                   'freq_weighted_iou': freqweighted_iou}
+
+        # Print the results if desired.
+        if self.verbose:
+            print("======\nnuScenes-lidarseg evaluation for {}".format(self.eval_set))
+            print(json.dumps(results, indent=4, sort_keys=False))
+            print("======")
+
+        return results
+
+    @staticmethod
+    def load_bin_file(bin_path: str) -> np.ndarray:
+        """
+        Loads a .bin file containing the labels.
+        :param bin_path: Path to the .bin file.
+        :return: An array containing the labels.
+        """
+        assert os.path.exists(bin_path), 'Error: Unable to find {}.'.format(bin_path)
+        bin_content = np.fromfile(bin_path, dtype=np.uint8)
+        assert len(bin_content) > 0, 'Error: {} is empty.'.format(bin_path)
+
+        return bin_content
+
+
+if __name__ == '__main__':
+    # Settings.
+    parser = argparse.ArgumentParser(description='Evaluate nuScenes-lidarseg results.')
+    parser.add_argument('--result_path', type=str,
+                        help='The path to the results folder.')
+    parser.add_argument('--eval_set', type=str, default='val',
+                        help='Which dataset split to evaluate on, train, val or test.')
+    parser.add_argument('--dataroot', type=str, default='/data/sets/nuscenes',
+                        help='Default nuScenes data directory.')
+    parser.add_argument('--version', type=str, default='v1.0-trainval',
+                        help='Which version of the nuScenes dataset to evaluate on, e.g. v1.0-trainval.')
+    parser.add_argument('--verbose', type=bool, default=False,
+                        help='Whether to print to stdout.')
+    args = parser.parse_args()
+
+    result_path_ = args.result_path
+    eval_set_ = args.eval_set
+    dataroot_ = args.dataroot
+    version_ = args.version
+    verbose_ = args.verbose
+
+    nusc_ = NuScenes(version=version_, dataroot=dataroot_, verbose=verbose_)
+
+    evaluator = LidarSegEval(nusc_, result_path_, eval_set=eval_set_, verbose=verbose_)
+    evaluator.evaluate()
diff --git a/python-sdk/nuscenes/eval/lidarseg/tests/__init__.py b/python-sdk/nuscenes/eval/lidarseg/tests/__init__.py