AAModelSelection

The amino acid substitution model describing the substation rates among amino acids is a key component in studying the evolutionary relationships among species from protein sequences. The amino acid models consist of a large number of parameters; therefore, they are estimated from large datasets with hundreds of alignments and used for analyzing individual alignments. A number of general models (e.g., LG, WAG, JTT, Q.pfam) or clade–specific models (e.g., Q.plant, Q.bird, Q.yeast, Q.mammal and Q.insect) have been introduced and widely used in phylogenetic analyses. The first task in analyzing protein sequences is selecting the best appropriate available models for the alignment under study. This can be done by selecting the model with maximum likelihood value that requires much running time for large alignments. Recently, machine learning methods such as ModelRevelator and ModelTeller have been proposed and work well for nucleotide data. that worked well on simulation DNA alignments. In this paper, we propose an efficient method, called ModelExpress to extract features from protein alignments to quickly train a convolution neuron network on a personal computer for selecting amino acid models. Experiments on both real and simulated data showed that ModelExpress performed well on simulation data. It was better than ModelFinder on empirical data from clade genomes.

Some steps to use:

Step 1: download the source code and datasets by command:
git clone https://github.com/tinhnh2/AAModelSelection.git
Step 2: run script to extract information using pairwise script or triplet script.
python triplet_extraction.py folder label
folder: the path to alignments
label: the true label of alignments
Step 3: run script to predict alignments: python predict_model.py alignment.csv
alignment.csv: the output of extraction script

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
testing_dataset		testing_dataset
training_dataset		training_dataset
AA_Model_Selector.py		AA_Model_Selector.py
README.md		README.md
pairwise_extraction.py		pairwise_extraction.py
triplet_extraction.py		triplet_extraction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AAModelSelection

About

Releases

Packages

Languages

tinhnh2/AAModelSelection

Folders and files

Latest commit

History

Repository files navigation

AAModelSelection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages