Open Image Downloader Toolkit

This repo is an improved wrapper to the standerd Open-Image-Toolkit with the sole reason of making the following changes :

Added **Resumeable ** features in the standard toolkit.This would be useful in case the user has connectivity issues or power outrages.
Employed version switching in the code base.
The repo also contains txt2xml.py file that converts the labels in xml format that would be useful for using it with Darknet - Yolo. Simply run the file after changing the input and output folder paths
I have updated the readme to provide comprehensive instructions of use.

For getting more information about Open Images Dataset, please go through Info.md file.

Downloading required files

Automatic Download Option

The required modules are present in requirements.txt and can be installed using

    pip install requirements.txt -r

For downloading the bounding box csv files

Follow the instructions provided at the official link of Open-Images Dataset. You would be required to download the 4 csv files as explained over here.

Alternatively, you can also follow the following steps

# get the training data
wget https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/train-images-boxable.csv
wget https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/train-annotations-bbox.csv

# get the test data
wget https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/test-annotations-bbox.csv
wget https://requestor-proxy.figure-eight.com/figure_eight_datasets/open-images/test-images.csv

In case you directly know the Version file's URL, you can directly download from there. For example, for the using the latest version, download from the following URLs
Using the automated script files

The file module/bounding_boxes.py already contains the above mentioned code. So, it will prompt you to donwload all the required files the first time you run the code.

P.S - If you clone this repo, you will already have the csv files present in /OID/csv_folder for V4. In case you want to work with other versions, delete the present files and overwrite them with the new files

Paraser Arguments

type_csv - Which class of images you want to download [ train, test, val or all]
classes - Space seperated name of classes you want to dowload
limit - Limit the number of images you want to download

Steps to be followed if you are resuming the paused download

Python3 is required

Run the following command to implement simple resume feature

    python main.py downloader --resume

In case you want to add additional items when resuming, run the following script

    python main.py downloader --classes Apple Orange --type_csv all --limit 400

By default, the script is configured to resume downloading from the previous state. In case you do not wish to do so, delete log.csv file present in the master folder.

Steps to be followed if you are downloading the images.

Python3 is required.

Clone this repository

    git clone https://github.com/Jash-2000/open_image_dataset_downloader.git

Install the required Images
```
    pip install -r requirements.txt
```
First of all, if you simply want a quick reminder of al the possible options given by the script, you can simply launch, from your console of choice, the main.py. Remember to point always at the main directory of the project
```
    python3 main.py
```
or in the following way to get more information
```
    python3 main.py --help
```
The full list of Parsers is present in the /Modules/parser.py

Download different classes in seperate folders

    python main.py downloader --classes Balloon Airplane --type_csv train --limit 400

download in the same folder

    python main.py downloader --classes Balloon Airplane --type_csv train --limit 1000 --multiclasses 1

Sources of error

If you are running the code for the first time (i.e. you are not resuming the code), make sure to delete the log.csv file. Different version of Pandas have different rule for overwriting.
Make sure you have downloaded the corrrect bounding box csv file.
Make sure that the class labels have been called correctly while running the script.

Downloaded file's structure

The algorith will take care to download all the necessary files and build the directory structure like this:

main_folder
│   main.py
│
└───OID
    │   file011.txt
    │   file012.txt
    │
    └───csv_folder
    |    |
    |    └───v4
    |    |
    |    └───v5
    |    |
    |    └───v6
    |         │   class-descriptions-boxable.csv
    |         │   validation-annotations-bbox.csv
    |
    └───Dataset
        |
        └─── test
        |
        └─── train
        |
        └─── validation
             |
             └─── v4
             |
             └─── v5
             |
             └─── v6
                  |
                  └───Apple
                  |     |
                  |     |0fdea8a716155a8e.jpg
                  |     |2fe4f21e409f0a56.jpg
                  |     |...
                  |     └───Labels
                  |            |
                  |            |0fdea8a716155a8e.txt
                  |            |2fe4f21e409f0a56.txt
                  |            |...
                  |
                  └───Orange
                        |
                        |0b6f22bf3b586889.jpg
                        |0baea327f06f8afb.jpg
                        |...
                        └───Labels
                              |
                              |0b6f22bf3b586889.txt
                              |0baea327f06f8afb.txt
                              |...

If you have already downloaded the different csv files you can simply put them in the csv_folder. The script takes automatically care of the download of these files, but if you want to manually download them for whatever reason here you can find them.

If you interupt the downloading script ctrl+d you can always restart it from the last image downloaded.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
OID		OID
images		images
modules		modules
.gitignore		.gitignore
2018_YOLOv3 An incremental improvement.pdf		2018_YOLOv3 An incremental improvement.pdf
Info.md		Info.md
README.md		README.md
classes.txt		classes.txt
convert_annotations.py		convert_annotations.py
main.py		main.py
requirements.txt		requirements.txt
txt2xml.py		txt2xml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Image Downloader Toolkit

Downloading required files

Paraser Arguments

Steps to be followed if you are resuming the paused download

Steps to be followed if you are downloading the images.

Sources of error

Downloaded file's structure

About

Releases

Packages

Languages

Jash-2000/Improved_Open_image_dataset_toolkit

Folders and files

Latest commit

History

Repository files navigation

Open Image Downloader Toolkit

Downloading required files

Paraser Arguments

Steps to be followed if you are resuming the paused download

Steps to be followed if you are downloading the images.

Sources of error

Downloaded file's structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages