The full research is included in the Report.
In the search of other habitable planets, we’ve recently started to collect exoplanetary data. In order to make educated decisions about which planets to dedicate more research and attention to, scientists need data that is of high quality. Most of the data available to us currently, however, contains mistakes and missing values. To alleviate this problem, we need to spend more effort researching efficient ways to improve the quality of our data. This paper will focus on comparing different machine learning (ML) approaches on their efficacy in predicting missing exoplanetary data. This will be done by predicting said data and evaluating how well the different models perform based on accuracy. The results will then be analysed and some insights into the strengths and weaknesses of the algorithms will be given. Finally, a conclusion will be given about which algorithms perform well and what improvements could be done to the predictive algorithm.
As we can see, the data that we have available to us could use improvement. This could be done through algorithms that specialized algorithms that detect erroneous information and either remove it or find a way to restore it. Regardless, even without high-quality pre-processing, we can make programs that are capable of making accurate predictions on the exoplanetary datasets.
From the algorithms analyzed in the study, there is one that consistently performs well – the Decision Tree model. Whether its success owes to the fact it is less influenced by erroneous information and data with high bias and variance is difficult to say. It does however perform quite well on the PHL-EC dataset. Some algorithms have better performance on particular features than they have on others. For example, Lasso Regression seems to have higher accuracy on radial features, while the Multi-layer Perceptron Regressor has consistently high results on features valued between 0 and 1. This is not conclusive evidence, though, and more research should be conducted.
Finally, we have seen that the algorithms listed above are capable of making highly-accurate predictions on the habitability of planets without using complex mathematical formulas. According to some physicists, ESI values of less than 0.75 for most planets means that they are not habitable. Our current ML tools are capable enough to give us an educated guess into which planets are worth spending more time analyzing as potential habitable planet candidates.