Skip to content

tolaouk/Font-Classification-with-Random-Forest

Repository files navigation

This report is going to classify fonts by their features which are the gray scale value extracted from an 20×20 pixels image. Thus, each case will have 400 features. The interested data is downloaded from “University of California Irvine Repository of Machine Learning Datasets”. There are 153 .csv files and each file contain a bunch of cases with some information about the case such as if the font is bolded or presented in italic style, etc. The most important things are the 400 features came from the gray scale value of each pixel. 4 files are selected out of 153 files. The size of each class will be compared and if they are imbalanced, cloning, perturbation, or SMOTE will be applied to oversample the smallest class to get a more balanced data set. Otherwise, the classifier will be biased. After the data is cleaned, treated, and applied principal component analysis (PCA) to reduce the number of features, the data will be split into 20% of test set and 80% of training set randomly. This report is done in R programming language. The algorithmic principles for basic random forest will be interpreted. Several important or most common used inputs, parameters, options will be explained. The outputs of the random forest will be discussed as well. Also, the predict function in R will be listed and explained as well. When everything is ready, apply the Random Forest Automatic Classifier to build the model. We would like to know if it is possible to use gray scale value of each pixel to predict the correct font. If so, what is the accuracy of prediction. Then, use these two sets to find out the best number of trees in the random forest model. Also, the importance of first 10 principal components will be discussed and compared to the percentage of explained variance. Final section will discuss several potential methods to improve the classification.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages