Comparison of machine learning algorithms in predicting missing data in exoplanet survey

The full research is included in the Report.

Abstract

In the search of other habitable planets, we’ve recently started to collect exoplanetary data. In order to make educated decisions about which planets to dedicate more research and attention to, scientists need data that is of high quality. Most of the data available to us currently, however, contains mistakes and missing values. To alleviate this problem, we need to spend more effort researching efficient ways to improve the quality of our data. This paper will focus on comparing different machine learning (ML) approaches on their efficacy in predicting missing exoplanetary data. This will be done by predicting said data and evaluating how well the different models perform based on accuracy. The results will then be analysed and some insights into the strengths and weaknesses of the algorithms will be given. Finally, a conclusion will be given about which algorithms perform well and what improvements could be done to the predictive algorithm.

Conclusion

As we can see, the data that we have available to us could use improvement. This could be done through algorithms that specialized algorithms that detect erroneous information and either remove it or find a way to restore it. Regardless, even without high-quality pre-processing, we can make programs that are capable of making accurate predictions on the exoplanetary datasets.

From the algorithms analyzed in the study, there is one that consistently performs well – the Decision Tree model. Whether its success owes to the fact it is less influenced by erroneous information and data with high bias and variance is difficult to say. It does however perform quite well on the PHL-EC dataset. Some algorithms have better performance on particular features than they have on others. For example, Lasso Regression seems to have higher accuracy on radial features, while the Multi-layer Perceptron Regressor has consistently high results on features valued between 0 and 1. This is not conclusive evidence, though, and more research should be conducted.

Finally, we have seen that the algorithms listed above are capable of making highly-accurate predictions on the habitability of planets without using complex mathematical formulas. According to some physicists, ESI values of less than 0.75 for most planets means that they are not habitable. Our current ML tools are capable enough to give us an educated guess into which planets are worth spending more time analyzing as potential habitable planet candidates.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Images		Images
data		data
results		results
ASCDESC-Sort.py		ASCDESC-Sort.py
Analyze.py		Analyze.py
ErroneousPlanetsRemover.py		ErroneousPlanetsRemover.py
GenerateImgs.py		GenerateImgs.py
GraphTool.py		GraphTool.py
GraphToolAlgo.py		GraphToolAlgo.py
README.md		README.md
Report.pdf		Report.pdf
infoCollect.py		infoCollect.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of machine learning algorithms in predicting missing data in exoplanet survey

Abstract

Conclusion

About

Uh oh!

Releases

Packages

Languages

DichoMire/Exoplanets

Folders and files

Latest commit

History

Repository files navigation

Comparison of machine learning algorithms in predicting missing data in exoplanet survey

Abstract

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages