Introduction

Alzheimer’s is a widespread, irreversible, progressive neurodegenerative disease, with a complex genetic architecture. The key goal of this project is to seek out disease risk genes and classify them as Alzheimer's Disease associated and unassociated.

Various machine learning algorithms have been used to predict candidate genes. Previous prediction methods can be roughly divided into five types-

Methods studying protein-protein interaction networks
Gene functional annotations
Sequence-based features patterns
Machine learning and network topological features
Information about tissue-specific networks

These methods predict associated genes or biomarkers. However, there are few reports on brain gene expression data. Accordingly, the research paper by Huang et al. on Revealing Alzheimer’s disease genes spectrum in the whole-genome by machine learning was used as a reference for this project.

The aim is to divide the genes into five classes, namely C1-AD: probable pathogenic genes, C2-AD: high confidence genes, C3-AD: related genes, and C4-AD: possibly associated genes.

Libraries

Numpy
Scipy
Sklearn
Pandas
Pylab
Matplotlib
Itertools

Environment- Python 3.6, Windows 10

Dataset

The dataset used in the above-mentioned research paper was taken from the AlzGene archive . The training features include number of positive and negative Alzheimer's cases in control studies and family-based studies for 335 genes.

The lack of sufficient data samples make it difficult to train the model. Accordingly, regularization has been used to prevent overfitting.
For training purposes, 33% of the data was used for testing.

Results

The followed algorithms were trained on the given dataset-

Support Vector Machine using Radial Kernel
Support Vector Machine using Linear Kernel
Support Vector Machine using Polynomial Kernel
Support Vector Machine using Sigmoid kernel
Decision Trees

The algorithms were evaluated on micro average, macro average, and weighted average of their accuracy, precision, F-1 score and support results on the four predicted classes.

Of these, desicion trees gave the best accuracy (88.29%).
However, the highest Receiver Operating Characteristic (ROC) curve area of 0.78 was obtained using Support Vector Machine with Radial kernel.

Note- The results on Support Vector Machine using R library were provided in the paper and were not reproduced by us.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Codes		Codes
Images		Images
Project Report		Project Report
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Libraries

Dataset

Results

About

Releases

Packages

Languages

License

isha-git/Alzheimers-Disease

Folders and files

Latest commit

History

Repository files navigation

Introduction

Libraries

Dataset

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages