-
Notifications
You must be signed in to change notification settings - Fork 0
Images Classification Recipe
Convolutional Neural Networks (CNN) and Transfer Learning for Heritage Images Classification
This recipe exposes an image classification scenario, aiming to deduce the technique or genre of heritage images (picture, drawing, map...) using a Convolutional Neural Networks pretrained model (i.e. a supervised approach with transfer learning). We leverage the classification ability of these models, trained to detect objects, to use them on our documents classification scenario.
This recipe may have various library use cases, particularly for cataloguing and information retrieval systems.
This recipe is based on BnF materials
This recipes includes a basic introduction to neural networks and deep learning (formal neuron model, neural networks, convolutional neural networks, transfer learning) and a hands-on session:
- Creation of an images dataset for training
- Training of a classification model with commercial AI platforms or open source IA frameworks
- Application of the model to the heritage images to be classified
The IIIF standard API is used to extract images from digital repositories, but raw files may also be processed.
For the theory, see the BnF github for a 45 mn introduction course (FR and EN versions, direct link).
Prerequisites: IBM Watson Studio account or Google Cloud AutoML account (see the setup documents, FR and EN versions)
A four classes scenario dataset (picture/drawing/map/noise) can be downloaded here, but it's up to you to build your own use case.
The dataset illustrates this scenario:
- filtering of "noisy" illustrations (blank pages, text pages)
- illustrations classification in 3 categories (picture, drawing, map)
IBM Watson Studio and Google Cloud AutoML have been tested and this howto documents the setup of a new user account and the creation of a visual recognition project for both platforms.
The following steps suppose you are using Watson Studio, but the Google AutoML case is very similar. This howto is also documented in a presentation (FR and EN versions).
Once Watson Studio web app is launched, choose the "Classify Images" custom model to create your new classification project, as described in the howto.
Now, you can download your images dataset, each class being a .zip archive.
Classes can be renamed, their content may be updated.
When all the classes are ingested, the training process can start ("Train Model" button).
On the Watson platform
Local images can be droped on the test page to launch an inference and test the performance model. Watson studio outputs the confidence scores for all the model's classes.
At this point, the model could be deployed using SDKs or APIs. The next section demonstrates the API case.
Outside the platform, using API and code
Before implementing the model in your code, you need to obtain two pieces of information: the Watson API key and the model ID.
- Watson API key
This information is available in your resources list, under the Services category.
After choosing the right service, the API key can then be copied/pasted in your code or downloaded.
- Model ID
The IBM Watson model ID can be found on your project page, under the Assets tab.
You can now use the Watson REST APIs or the corresponding SDKs to develop applications that interact with the service.
- curl commands
These two basic curl command lines show a way to interact with the API in a very simple way. Open a Terminal window and type this command, taking care to replace the fields your_api_key and your_model_ID with the values you just got.
To classify an image through its URL:
> curl -u "apikey:your_api_key" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?url=your_URL&version=2018-03-19& classifier_ids=your_model_ID"
This example displays the classification result of a Gallica IIIF image:
- infered class = "photo"
- confidence score = 0,843
To classify a local image:
> curl -X POST -u "apikey:your_api_key" –F "images_file=@file_path" -F "classifier_ids=your_model_ID""https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"
- Python script
The very same curl commands may be integrated in an application script similar to this one, which extract some document metadata thanks to the digital library APIs (Gallica), then extracts the images from the DLs repositories (Gallica and the Welcome Collection) using the IIIF Image protocol, and finally calls a Watson model to classify the illustrations.
This associated Jupyter notebook demonstrates the whole process. (To learn how to launch a Jupyter bnotebook, look at this documentation.)
The notebook may also be ran in your browser with Binder:
After a few minutes, Binder is displayed in your browser. You must see this page:
Click on the Binder folder and then on the classify-img-with-iiif-and-watson.ipynb notebook. The notebook opens in a new browser window and you may want to run it, step by step (Shift-Return keys):
_Mind to insert your Watson API key and Model ID at section 3!)
Prerequisites: basic scripting and command line skills (Python scripts are used)
An AI framework must be used: TensorFlow (Google), PyTorch (Facebook), CNTK (Microsoft), Caffe2, Keras, etc.
This implementation leverages the Inception-v3 model and applies a transfert learning method: the Inception-v3 model last layer is retrained on the images ground truth dataset.
First, TensorFlow must be installed.
Three Python scripts (within the Tensorflow framework) are used to train (and evaluate) a local model:
- split.py: the GT dataset is splitted in a training set (e.g. 2/3) and an evaluation set (1/3). The GT local dataset directory and the training/evaluation ratio must be defined in the script.
- retrain.py: the training set is used to train the last layer of the Inception-v3 model. The training dataset path and the generated model path must be defined. The Inception model is downloaded from the retrain.py script.
- label_image.py: the evaluation set is labeled by the model. The model path and the input images path must be defined.
To classify a set of images and output the results in a CSV file:
>python3 label_image.py > out.csv
Running the script outputs a line per classified image:
bd carte dessin filtrecouv filtretxt gravure photo foundClass realClass success imgTest
0.01 0.00 0.96 0.00 0.00 0.03 0.00 drawing OUT_img 0 btv1b10100491m-1-1.jpg
0.09 0.10 0.34 0.03 0.01 0.40 0.03 engraving OUT_img 0 btv1b10100495d-1-1.jpg
...
Each line describes the best classified class (according to its probability) and also the probability for all the other classes.
- IBM Watson documentation
- Google AutoML documentation
- Google AutoML for beginners
- Convolutional neural networks:
- Library of Congress Newspaper Navigator
- GallicaPix proof of concept github