Data collection is an expensive process that only large, profitable companies have the means to afford. This leaves smaller, less profitable organizations with no choice but to re-use data, which in some cases might have been collected for a different purpose.
In addition, all demographics are not always equally represented in the data. There is more data about individuals from major demographics, and less or no data about people from minorities. This leads to a higher error rate for individuals of minority demographics.
In this work, we explore different steps involved in manipulating data and choosing the right algorithm to create unbiased Machine Learning applications. The work was intended and is highly applicable in developing countries where there are not enough resources for data collection, and most of the time the demographics representing the target users are not well represented in the training data.