Skip to content

Fast alternative to FiftyOne for creating a subset of the COCO dataset.

Notifications You must be signed in to change notification settings

tikitong/minicoco

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

minicoco

This script presents a quick alternative to FiftyOne to create a subset of the 2017 coco dataset. It allows the generation of training and validation datasets. With a single images folder containing the images and a labels folder containing the image annotations for both datasets in COCO (JSON) format. It is main inspired by the notebook pycocoDemo and this stackoverflow solution for the download method.

Its execution creates the following directory tree:

data/
    images/ *.jpg
    labels/ train.json
            val.json

Installation

The use of conda is recommended. The following steps are required in order to run the script:

conda create -n minicoco python=3.9
conda activate minicoco
git clone https://github.com/tikitong/minicoco.git 
cd minicoco
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip ./annotations_trainval2017.zip
pip install -r requirements.txt

Usage

usage: script.py [-h] [-t TRAINING] [-v VALIDATION] [-cat NARGS [NARGS ...]] annotation_file

positional arguments:
  annotation_file       annotations/instances_train2017.json path file.

optional arguments:
  -h, --help            show this help message and exit
  -t TRAINING, --training TRAINING
                        number of images in the training set.
  -v VALIDATION, --validation VALIDATION
                        number of images in the validation set.
  -cat NARGS [NARGS ...], --nargs NARGS [NARGS ...]
                        category names.

The 80 categories that can be used with the -cat argument are the following:

person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush
code
#from https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb
from pycocotools.coco import COCO
coco = COCO("annotations/instances_train2017.json")
cats = coco.loadCats(coco.getCatIds())
nms = [cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))

You can run for example: python script.py annotations/instances_train2017.json -t 30 -v 10 -cat car airplane person.