-
Notifications
You must be signed in to change notification settings - Fork 10
Home
TM-align is a sequence-order dependent pairwise protein structure alignment program. This C++ version implements a number of additional options not available in the original Fortran version, including the fTM-align algorithm (the -fast
option), pairwise alignments between a large number of files (the -dir
, -dir1
, -dir2
, and -suffix
option), alignment for each chain in multi-chain PDB files (the -ter
, and -split
option), alternative input and output format (the -infmt1
, -infmt2
, and outfmt
option), alignment for RNA/DNA, and emulation of TM-score program. A full list of options are available by
TMalign -h
This package is developed and test on 64 bit Linux only, although it should in theory work on any POSIX compliant operating system such as macOS. This program requires g++
and make
. To build the program, just type
make
This package also comes with utility programs for converting and manipulating PDB, PDBx/mmCIF, and xyz format structure files specific for this TM-align package. They are:
pdb2xyz
convert PDB file or a folder of PDB files into one xyz format file, typically with multiple entries.
xyz_sfetch
can retrieve entries from a big xyz format file with multiple entries. Fetching entries by xyz_sfetch requires the source xyz format file being indexed by xyz_sfetch
first, which is a fast process (~2 s for LOMETS PDB library with ~70000 entries). The time of fetching operation scales linearly with the length of entry list, but is almost unrelated to the size of xyz file. Therefore, this program is designed for fetch a small number of entries from a big xyz file: fetching the whole PDB70 from PDB70 itself take ~0.5 min; fetching 1000 entries from PDB70 only takes 1~2 s.
pdb2fasta
extracts FASTA format amino acid (or nucleotide) sequence from PDB or xyz format file. This program can be used for checking xyz file integrity: if one of the chains in an xyz file is corrupted, the last sequence ouput by pdb2fasta
will be the corrupted chain instead of the last chain in the whole xyz file. Therefore, simply checking the number of entries from xyz_sfetch
and that from pdb2fasta
from the same xyz file can quickly identify corrupted chain.
se
extracts sequence alignment from a pair of superposed structure.
NWalign
performs Needleman-Wunsch global sequence alignment between a pair of sequences.
pdb2ss
assigns secondary structure for a protein or RNA molecule.
Apart from the standard Protein Data Bank (PDB) format coordinate file, TM-align also accepts SPICKER format Cα coordinate file, and its own xyz format file. Additionally, TMalign
, se
, pdb2xyz
and pdb2fasta
(but not xyz_sfetch
) can directly read PDB, SPICKER or xyz files compressed in gz or bz2 format, assuming that zcat
or bzcat
, respectively, are installed in the user's system.
The xyz format file for TM-align is similar to the standard xyz format except that only Cα atom is included and that the first column is residue type instead of atom type. An example xyz file with two entries is shown below:
13
5w4kA
S 72.311 -51.185 -2.918
P 73.936 -53.176 -5.841
G 70.928 -52.645 -6.676
N 71.123 -49.079 -7.785
A 77.496 -48.353 -8.621
S 78.090 -52.061 -7.844
S 76.393 -58.239 -7.639
N 74.811 -59.905 -4.562
S 72.182 -57.623 -3.594
A 68.953 -56.708 -4.982
S 65.951 -58.235 -6.834
A 64.856 -59.057 -10.404
N 65.245 -56.660 -13.331
11
6b1tW
G 90.717 92.122 308.228
L 93.058 92.926 305.367
R 91.495 96.358 304.879
F 92.711 97.742 308.205
P 95.456 100.350 307.654
S 98.908 99.779 309.067
K 100.010 101.232 312.405
M 102.553 104.041 312.017
F 103.757 106.727 314.397
G 101.546 109.763 313.957
G 99.622 108.487 310.948
Here, 13 in the first line is chain length (the number of residues) and the second line is the name of entry (5w4kA). The next 13 lines are the Cα atom coordinates of all 13 residues: position 1 is amino acid type; position 3-10, 12-19, and 21-28 are the X, Y, and Z coordinates, respectively. The next entry (6b1tW with 11 residues) follows without empty line.
This is a re-implementation of TM-align algorithm in C/C++. The code was written by Jianyi Yang and later updated by Jianjie Wu, Sha Gong, and Chengxin Zhang at The Yang Zhang lab, Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218. Please report bugs and questions to yangzhanglab@umich.edu
DISCLAIMER: Permission to use, copy, modify, and distribute this program for any purpose, with or without fee, is hereby granted, provided that the notices on the head, the reference information, and this copyright notice appear in all copies or substantial portions of the Software. It is provided "as is" without express or implied warranty.
If you find TM-align useful in your research, please cite the first paper listed below. If you use the -byresi
option for TM-score superposition without re-alignment, please cite the second paper instead. If you use the -fast
option for fast TM-align calculation, please cite the third paper. If you use the program for RNA/DNA structure alignment, please cite the fourth paper/
- Yang Zhang and Jeffrey Skolnick. "TM-align: a protein structure alignment algorithm based on the TM-score." Nucleic acids research 33.7 (2005): 2302-2309.
- Zhang, Yang, and Jeffrey Skolnick. "Scoring function for automated assessment of protein structure template quality." Proteins: Structure, Function, and Bioinformatics 57.4 (2004): 702-710.
- Runze Dong, Shuo Pan, Zhenling Peng, Yang Zhang, and Jianyi Yang. "mTM-align: a server for fast protein structure database search and multiple protein structure alignment." Nucleic acids research (2018).
- Sha Gong, Chengxin Zhang, Yang Zhang, RNA-align: quick and accurate alignment of RNA 3D structures based on size-independent TM-scoreRNA. Bioinformatics (2019)