This repository contains the R code that generates the result in the paper 'Improving The Diagnosis of Thyroid Cancer by Machine Learning and Clinical Data'. Please check our publication on Scientific Reports for details. The dataset collected and used in this study can be found at our Zenodo repository.
The data clean, preprocess, and format transformation.
The descriptive statistics of the dataset; the model performance of logistic regression, random forest, GBM, LDA, and SVM, measured by 10-fold cross-validation.
The model performance uncertainty of logistic regression, random forest, GBM, LDA, and SVM, measured by boostrap analysis.
4. permutation_importance_logistic.R, permutation_importance_randomforest.R, permutation_importance_gbm.R, permutation_importance_lda.R, permutation_importance_svm.R
The normalized permutation predictor importance calculated under logistic regression, random forest, GBM, LDA, and SVM.
The comparison of five prediction measurements between expert assessment and random forest model.
All the R code that creates the figures in the paper.
If you have any suggestions and comments on this study, please contact Nan Miles Xi (mxi1@luc.edu).
If you use the dataset or R code in your work, please cite
Xi, M.N., Wang, L., and Yang, C. (2022). Improving The Diagnosis of Thyroid Cancer by Machine Learning and Clinical Data. Scientific Reports 12, 1143.