COCO experiments code

adiprasad · Sep 9, 2018 · 0bd490c · 0bd490c
commit 0bd490c
Show file tree

Hide file tree

Showing 23 changed files with 2,495 additions and 0 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "py-faster-rcnn-ft"]
+	path = py-faster-rcnn-ft
+	url = https://github.com/adiprasad/py-faster-rcnn-ft.git
diff --git a/README.md b/README.md
@@ -0,0 +1,96 @@
+# unsup-hard-negative-mining-mscoco
+This is the repository for experiments on the MSCOCO classes mentioned in the paper [Unsupervised Hard Example Mining from Videos for Improved Object Detection](https://arxiv.org/abs/1808.04285) mentioned in Section 5(Discussion).
+
+We used the original version of [py-faster-rcnn-ft](https://github.com/DFKI-Interactive-Machine-Learning/py-faster-rcnn-ft) to fine-tune the VGG16 network pretrained on ImageNet dataset to convert it to a binary classifier for an MSCOCO category. Once we had the classifier as the backbone network of the Faster RCNN, we used it to label all the frames within a video for the presence of that particular MSCOCO category. Using the labelled frames, we were able to identify the frames containing hard negatives with the help of our algorithm. Finally, we fine tuned the network again after including the frames containing hard negatives and evaluated the network for improvements using held out validation and test sets.
+
+For our research, we carried out experiments on two MSCOCO categories, Dog and Train. 
+
+## Steps :-
+
+### 1. Preparing a Faster RCNN object detector on an MSCOCO category
+
+Follow the steps mentioned in the [py-faster-rcnn-ft](https://github.com/DFKI-Interactive-Machine-Learning/py-faster-rcnn-ft) repository to prepare a VGG16 Faster RCNN network trained on an MSCOCO category of your choice.  
+
+### 2. Label the videos with detections
+
+Scrape the web and download videos that are likely to contain a lot of instances of your chosen category. Helper code to download youtube videos can be found [here](utils/scrape-youtube/scrape_videos.py). Once the videos have been downloaded, run the detections code to label each frame of every video with bounding boxes and confidence scores for that category. See [Usage](detections_code/README.txt) 
+
+The list of videos we used is mentioned below :-
+
+1. [Dog videos](https://docs.google.com/spreadsheets/d/1q9EeOHVYXugtmR1batdDDsb5wzWnwiQc-egLDmdWk78/#gid=1264294087) 
+2. [Train videos](https://docs.google.com/spreadsheets/d/1q9EeOHVYXugtmR1batdDDsb5wzWnwiQc-egLDmdWk78/#gid=994319682)
+
+### 3. Hard negative mining 
+
+The detections code outputs a txt file containing frame wise labeling and bounding box information. Use the hard negative mining code on the detections txt file to output the frames containing hard negatives and a txt file containing the bounding box information on those frames. See [Usage](hn_mining_code/README.txt). 
+
+### 4. Include the video frames containing hard negatives in the COCO dataset and fine-tune
+
+Use the COCO annotations editor located inside utils to include the frames containing hard negatives in MSCOCO dataset. One the frames have been included in the COCO dataset, fine-tune to get an improved network. See [Usage](utils/edit-coco-annotations/README.txt)
+
+
+## Results :-
+
+
+<br>
+
+A summary of the results is mentioned below :-
+
+<br>
+<table>
+    <thead>
+        <tr>
+          <th><b>Category</b></th>
+            <th><b>Model</b></th>
+            <th><b>Training Iterations</b></th>
+            <th><b>Training Hyperparams</b></th>          
+            <th><b>Validation set AP</b></th>          
+          <th><b>Test set AP</b></th>          
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td rowspan=2>Dog</td>
+            <td rowspan=1>Baseline</td>
+            <td rowspan=1>29000</td>
+            <td>LR : 1e-3 for 10k,<br>1e-4 for 10k-20k,<br>1e-5 for 20k-29k</td>
+            <td rowspan=1>26.9</td>
+            <td rowspan=1>25.3</td>
+        </tr>
+        <tr>
+            <td rowspan=1>Flickers as HN</td>
+            <td rowspan=1>22000</td>
+            <td>LR : 1e-4 for 15k,<br>1e-5 for 15k-22k</td>
+            <td rowspan=1>28.1</td>
+            <td rowspan=1>26.4</td>
+        </tr>
+        <tr>
+            <td rowspan=2>Train</td>
+            <td rowspan=1>Baseline</td>
+            <td rowspan=1>26000</td>
+            <td>LR : 1e-3,<br>stepsize : 10k,<br>lr decay : 0.1</td>
+            <td rowspan=1>33.9</td>
+            <td rowspan=1>33.2</td>
+        </tr>
+        <tr>
+            <td rowspan=1>Flickers as HN</td>
+            <td rowspan=1>24000</td>
+            <td>LR : 1e-3,<br>stepsize : 10k,<br>lr decay : 0.1</td>
+            <td rowspan=1>35.4</td>
+            <td rowspan=1>33.7</td>
+        </tr>
+    </tbody>
+</table>
+
+<br>
+A few examples on the reduction in false positives achieved for the 'Dog' category are mentioned below :-
+<br>
+<br>
+
+Baseline             |  Flickers as HN
+:-------------------:|:--------------------:
+![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame330_before.jpg)  |  ![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame330_after.jpg)
+![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame1548_before.jpg)  |  ![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame1548_after.jpg)
+![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame3156_before.jpg)  |  ![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame3156_after.jpg)
+![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame9195_before.jpg)  |  ![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame9195_after.jpg)
+![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame43837_before.jpg)  |  ![](https://people.cs.umass.edu/~aprasad/Detector_Results/dog_detector/images_iter_1/video1/hns_shown/frame43837_after.jpg)
diff --git a/detections_code/README.txt b/detections_code/README.txt
@@ -0,0 +1,32 @@
+To use the detections code, download all your videos inside a parent folder say 'downloaded_videos' with the following structure :-
+
+downloaded_videos
+|
+|--video1
+|	|--video1.mkv
+|
+|--video2
+|	|--video2.mkv
+|	
+..
+..
+..
+
+Helper code is available to convert the videos present inside a folder to the above mentioned folder structure. 
+
+Steps :-
+
+1. Specify the path of model weights to be used for running the network on the videos on line 28.
+2. Decide an output_folder where the detection txt files will be placed.
+3. Decide a confidence_threshold crossing which the detections will be reported.  
+
+3. Usage :-
+
+python2 tools/test_dets_dog_detector.py \
+--downloaded_videos $1 \
+--video ID \
+--out_folder output_folder \
+--conf_thresh confidence_threshold
+
+
+where ID : 1,2,3 etc. as per the videos have been saved inside the downloaded_videos folder hierarchy
diff --git a/detections_code/getDogDetections.sh b/detections_code/getDogDetections.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+#SBATCH --job-name=detectron
+#SBATCH -N1                          # Ensure that all cores are on one machine
+#SBATCH --partition=m40-long              # Partition to submit to (serial_requeue)
+#SBATCH --mem=50000                     # Memory pool for all cores (see also --mem-per-cpu)
+#SBATCH --output=logs/result_%A_%a.out            # File to which STDOUT will be written
+#SBATCH --error=logs/result_%A_%a.err            # File to which STDERR will be written
+#SBATCH --gres=gpu:1
+#SBATCH --array=1-15
+## Usage:sbatch
+## # sbatch scripts/run_video_cluster. ${VIDEO_PATH} ${DATASET_NAME}
+echo `pwd`
+echo $1
+echo "SLURM task ID: "$SLURM_ARRAY_TASK_ID
+##### Experiment settings #####
+VIDEO_PATH=$1/video${SLURM_ARRAY_TASK_ID}       # argument to the script is the video name
+OUTPUT_NAME=/mnt/nfs/scratch1/aprasad/dog_detection_outputs/${2}
+MIN_SCORE=0.6
+echo "Chunk path: "${VIDEO_PATH}
+
+python2 tools/test_dets_dog_detector.py \
+--video_folder $1 \
+--video ${SLURM_ARRAY_TASK_ID} \
+--out_folder ${OUTPUT_NAME} \
+--conf_thresh $2
+
diff --git a/detections_code/getTrainDetections.sh b/detections_code/getTrainDetections.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+#SBATCH --job-name=detectron
+#SBATCH -N1                          # Ensure that all cores are on one machine
+#SBATCH --partition=titanx-long              # Partition to submit to (serial_requeue)
+#SBATCH --mem=50000                     # Memory pool for all cores (see also --mem-per-cpu)
+#SBATCH --output=logs/result_%A_%a.out            # File to which STDOUT will be written
+#SBATCH --error=logs/result_%A_%a.err            # File to which STDERR will be written
+#SBATCH --gres=gpu:1
+#SBATCH --array=1-26
+## Usage:sbatch
+## # sbatch scripts/run_video_cluster. ${VIDEO_PATH} ${DATASET_NAME}
+echo `pwd`
+echo $1
+echo "SLURM task ID: "$SLURM_ARRAY_TASK_ID
+##### Experiment settings #####
+VIDEO_PATH=$1/video${SLURM_ARRAY_TASK_ID}       # argument to the script is the video name
+OUTPUT_NAME=/mnt/nfs/scratch1/aprasad/TRAIN_ATTEMPT_3/detections/set4/${2}
+MIN_SCORE=0.6
+echo "Chunk path: "${VIDEO_PATH}
+
+python2 tools/test_dets_train_detector.py \
+--video_folder $1 \
+--video ${SLURM_ARRAY_TASK_ID} \
+--out_folder ${OUTPUT_NAME} \
+--conf_thresh $2
+
diff --git a/detections_code/makeVideoFolderStructure/change_filenames_in_folder.py b/detections_code/makeVideoFolderStructure/change_filenames_in_folder.py
@@ -0,0 +1,35 @@
+import os 
+from os.path import join
+import pickle
+
+parent_dir = os.getcwd()		# Set the parent folder of all the chunks here
+
+mapping_txt_file = open(join(parent_dir, "video_name_key_map.txt"), "a+")
+
+video_name_cntr = 1
+filename_mapping_dict = {}
+
+
+files_in_chunk = os.listdir(parent_dir)
+mkv_files = filter(lambda x : x.endswith(".mkv"), files_in_chunk)
+
+for file_name in mkv_files:
+	file_path = join(parent_dir, file_name)
+
+	os.mkdir(join(parent_dir,"video{0}".format(video_name_cntr)))
+	new_video_path = join(parent_dir,"video{0}".format(video_name_cntr))
+
+	new_file_path = join(new_video_path, "video{0}.mkv".format(video_name_cntr))
+
+	os.rename(file_path, new_file_path)
+
+	filename_mapping_dict[file_path] = video_name_cntr
+	mapping_txt_file.write("{0}\t{1}\n".format(video_name_cntr, file_path))
+
+	video_name_cntr+=1
+
+
+mapping_txt_file.close()
+pickle.dump(filename_mapping_dict, open(join(parent_dir, "filename_mapping_dict.p"), "wb"))
+
+