Skip to content

A basic implementation of the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering algorithm in Python.

Notifications You must be signed in to change notification settings

summer0601/UPGMApy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UPGMApy

UPGMApy is a basic implementation of the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm, one of many used in bioinformatics (phylogenetics) for constructing evolutionary trees. (The typical data set is a matrix of molecular comparisons between species.)

The algorithm does this by repeatedly joining the columns and rows of the most similar (lowest-value) entries in the table until a final tree is constructed.

Usage

To run the algorithm, simply supply UPGMA with a lower triangular matrix and a corresponding list of labels. The output will be a single label corresponding to the final tree.

# Test table data and corresponding labels
M_labels = alpha_labels("A", "G") # Generates labels from "A" to "G"
M_data = [
    [],                         #A
    [19],                       #B
    [27, 31],                   #C
    [8, 18, 26],                #D
    [33, 36, 41, 31],           #E
    [18, 1, 32, 17, 35],        #F
    [13, 13, 29, 14, 28, 12]    #G
    ]
# Run UPGMA
UPGMA(M_data, M_labels)

For this test input, the output should be the following:

'((((A,D),((B,F),G)),C),E)'

The above example comes from an article by Dave Thomas

Todos

  • Add option for computing tree heights

License

MIT

About

A basic implementation of the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering algorithm in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%