Skip to content

Images Classification Recipe

CENL-AI-WG edited this page Aug 24, 2020 · 73 revisions

Images Classification Statut

Convolutional Neural Networks (CNN) and Transfer Learning for Heritage Images Classification

This recipe exposes an image classification scenario, aiming to deduce the technique or genre of heritage images (picture, drawing, map...) using a Convolutional Neural Networks pretrained model (i.e. a supervised approach with transfer learning). We leverage the classification ability of these models, trained to detect objects, to use them on our documents classification scenario.

image classification principle

This recipe may have various library use cases, particularly for cataloguing and information retrieval systems.

This recipe is based on BnF materials

Goals

This recipes includes a basic introduction to neural networks and deep learning (formal neuron model, neural networks, convolutional neural networks, transfer learning) and a hands-on session:

  1. Creation of an images dataset for training
  2. Training of a classification model with commercial AI platforms or open source IA frameworks
  3. Application of the model to the heritage images to be classified

The IIIF standard API is used to extract images from digital repositories, but raw files may also be processed.

Educational resources

Introduction to the core technology used

For the theory, see the BnF github for a 45 mn introduction course (FR and EN versions, direct link).

Implementation notes

A. Images classification using AI SaaS platforms

Prerequisites: IBM Watson Studio account or Google Cloud AutoML account (see the setup documents, FR and EN versions)

1. Use case definition: choice of the source images and the model classes

A four classes scenario dataset (picture/drawing/map/noise) can be downloaded here, but it's up to you to build your own use case.

The dataset illustrates this scenario:

  • filtering of "noisy" illustrations (blank pages, text pages)
  • illustrations classification in 3 categories (picture, drawing, map)

noise picture drawing map

2. Choice of the SaaS platform

IBM Watson Studio and Google Cloud AutoML have been tested and this howto documents the setup of a new user account and the creation of a visual recognition project for both platforms.

The following steps suppose you are using Watson Studio, but the Google AutoML case is very similar. This howto is also documented in a presentation (FR and EN versions).

3. Creation of a Watson Studio Visual Recognition project

Once Watson Studio web app is launched, choose the "Classify Images" custom model to create your new classification project, as described in the howto.

Classify images model

4. Downloading the training dataset

Now, you can download your images dataset, each class being a .zip archive.

image classification principle

Classes can be renamed, their content may be updated.

5. Training of the model

When all the classes are ingested, the training process can start ("Train Model" button).

Training process

5. Test of the classification model

On the Watson platform

Local images can be droped on the test page to launch an inference and test the performance model. Watson studio outputs the confidence scores for all the model's classes.

Testing the model

At this point, the model could be deployed using SDKs or APIs. The next section demonstrates the API case.

Outside the platform, using API and code

Before implementing the model in your code, you need to obtain two pieces of information: the Watson API key and the model ID.

  1. Watson API key

This information is available in your resources list, under the Services category.

Access to the API key

After choosing the right service, the API key can then be copied/pasted in your code or downloaded.

  1. Model ID

The IBM Watson model ID can be found on your project page, under the Assets tab.

Access to the model ID

You can now use the Watson REST APIs or the corresponding SDKs to develop applications that interact with the service.

  • curl commands

These two basic curl command lines show a way to interact with the API in a very simple way. Open a Terminal window and type this command, taking care to replace the fields your_api_key and your_model_ID with the values you just got.

To classify an image through its URL:

> curl -u "apikey:your_api_key" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?url=your_URL&version=2018-03-19& classifier_ids=your_model_ID" Access to the model ID

This example displays the classification result of a Gallica IIIF image:

  • infered class = "photo"
  • confidence score = 0,843

To classify a local image:

> curl -X POST -u "apikey:your_api_key" –F "images_file=@file_path" -F "classifier_ids=your_model_ID""https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"

  • Python script

The very same curl commands may be integrated in an application script similar to this one, which extract some document metadata thanks to the digital library APIs (Gallica), then extracts the images from the DLs repositories (Gallica and the Welcome Collection) using the IIIF Image protocol, and finally calls a Watson model to classify the illustrations.

This associated Jupyter notebook demonstrates the whole process. (To learn how to launch a Jupyter bnotebook, look at this documentation.)

The notebook may also be ran in your browser with Binder: launch the notebook in Binder

B. Images classification using AI frameworks

Prerequisites: basic scripting and command line skills (Python scripts are used)

An AI framework must be used: TensorFlow (Google), PyTorch (Facebook), CNTK (Microsoft), Caffe2, Keras, etc.

This implementation leverages the Inception-v3 model and applies a transfert learning method: the Inception-v3 model last layer is retrained on the images ground truth dataset.

First, TensorFlow must be installed.

Three Python scripts (within the Tensorflow framework) are used to train (and evaluate) a local model:

  • split.py: the GT dataset is splitted in a training set (e.g. 2/3) and an evaluation set (1/3). The GT local dataset directory and the training/evaluation ratio must be defined in the script.
  • retrain.py: the training set is used to train the last layer of the Inception-v3 model. The training dataset path and the generated model path must be defined. The Inception model is downloaded from the retrain.py script.
  • label_image.py: the evaluation set is labeled by the model. The model path and the input images path must be defined.

To classify a set of images and output the results in a CSV file:

>python3 label_image.py > out.csv

Running the script outputs a line per classified image:

bd carte dessin filtrecouv filtretxt gravure photo foundClass realClass success imgTest

0.01 0.00 0.96 0.00 0.00 0.03 0.00 drawing OUT_img 0 btv1b10100491m-1-1.jpg

0.09 0.10 0.34 0.03 0.01 0.40 0.03 engraving OUT_img 0 btv1b10100495d-1-1.jpg ...

Each line describes the best classified class (according to its probability) and also the probability for all the other classes.

Other resources

Clone this wiki locally