Leia esse README em português
.
A multilayer perceptron (MLP) consists of an Artificial Neural Network with at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.
In this repository there is a parallel implementation of an MLP that recognizes characters regardless of the font it is written in.
The original dataset consists of images from 153 character fonts obtained from UCI Machine Learning Repository. Some fonts were scanned from a variety of devices: hand scanners, desktop scanners or cameras. Other fonts were computer generated.
In order to use the code, you need to first and foremost clone this repository.
git clone github.com/viniciusvviterbo/Multilayer-Perceptron
cd ./Multilayer-Perceptron
In this project we opted for describing the main informations in the first line, an empty line - for ease of read, it is entirely optional -, and the data itself. Example:
[NUMBER OF CASES] [NUMBER OF INPUTS] [NUMBER OF OUTPUTS]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
For testing the code, we included a reduced dataset (sampleNormalizedFonts.in), and it can be used for better understanding the needed format.
A normalized dataset is preferred for its (kind of) absolute results given at the end of training: 0 or 1. To normalize the dataset, execute:
g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < PATTERN_FILE > NORMALIZED_PATTERN_FILE
Example:
g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < ./datasets/patternFonts.in > ./datasets/normalizedPatternFonts.in
Compile the source code using OpenMP
g++ mlp.cpp -o mlp -O3 -fopenmp -std=c++14
In this code, we are dividing the dataset informed by half. The first half is used for training purposes only, the second one is used for testing, this way the network sees the latter half as new content and tries to obtain the correct result.
For executing, the command needs some parameters:
./mlp HIDDEN_LAYER_LENGTH TRAINING_RATE THRESHOLD NUMBER_OF_THREADS < PATTERN_FILE
- HIDDEN_LAYER_LENGTH refers to the number of neurons in the network hidden layer;
- TRAINING_RATE refers to the network's rate of training, a floating point number used during the correction phase of backpropagation;
- THRESHOLD refers to the maximum error admitted by the network in order to obtain an acceptably correct result;
- NUMBER_OF_THREADS refers to the number of threads that the network is allowed to use;
- PATTERN_FILE refers to the normalized pattern file
Example:
./mlp 1024 0.1 1e-3 4 < ./datasets/normalizedPatternFonts.in
As a more handy way to execute, we included in this repository a shell script to facilitate testing and seeing results from multiple executions in order to obtain an average runtime.
./script.sh
The script compiles the code as a sequencial implementation and runs it 5 times, then compiles it again as a parallel implementation and runs it 5 more times. For that, we are using the (already normalized and formated) reduced dataset sampleNormalizedFonts.in.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Análise do Desempenho de uma Implementação Paralela da Rede Neural Perceptron Multicamadas Utilizando Variável Compartilhada - by GÓES, Luís F. W. et al, PUC Minas
Introdução a Redes Neurais Multicamadas - by Prof. Fagner Christian Paes
O que é a Multilayer Perceptron - from ML4U
Fabrício Goés Youtube Channel - by Dr. Luis Goés
Eitas Tutoriais - by Espaço de Inovação Tecnológica Aplicada e Social - PUC Minas
Koliko - by Alex Frukta & Vladimir Tomin