Skip to content
Chengxin Zhang edited this page Feb 12, 2019 · 17 revisions

TM-align User Guide

Table of Contents

Summary

TM-align is a sequence-order dependent pairwise protein structure alignment program. This C++ version implements a number of additional options not available in the original Fortran version, including the fTM-align algorithm (the -fast option), pairwise alignments between a large number of files (the -dir, -dir1, -dir2, and -suffix option), alignment for each chain in multi-chain PDB files (the -ter, and -split option), alternative input and output format (the -infmt1, -infmt2, and outfmt option), alignment for RNA/DNA, and emulation of TM-score program. A full list of options are available by

TMalign -h

Installation

This package is developed and test on 64 bit Linux only, although it should in theory work on any POSIX compliant operating system such as macOS. This program requires g++ and make. To build the program, just type

make

Utilities

This package also comes with utility programs for converting and manipulating PDB, PDBx/mmCIF, and xyz format structure files specific for this TM-align package. They are:

pdb2xyz

pdb2xyz convert PDB file or a folder of PDB files into one xyz format file, typically with multiple entries.

xyz_sfetch

xyz_sfetch can retrieve entries from a big xyz format file with multiple entries. Fetching entries by xyz_sfetch requires the source xyz format file being indexed by xyz_sfetch first, which is a fast process (~2 s for LOMETS PDB library with ~70000 entries). The time of fetching operation scales linearly with the length of entry list, but is almost unrelated to the size of xyz file. Therefore, this program is designed for fetch a small number of entries from a big xyz file: fetching the whole PDB70 from PDB70 itself take ~0.5 min; fetching 1000 entries from PDB70 only takes 1~2 s.

pdb2fasta

pdb2fasta extracts FASTA format amino acid (or nucleotide) sequence from PDB or xyz format file. This program can be used for checking xyz file integrity: if one of the chains in an xyz file is corrupted, the last sequence ouput by pdb2fasta will be the corrupted chain instead of the last chain in the whole xyz file. Therefore, simply checking the number of entries from xyz_sfetch and that from pdb2fasta from the same xyz file can quickly identify corrupted chain.

se

se extracts sequence alignment from a pair of superposed structure.

NWalign

NWalign performs Needleman-Wunsch global sequence alignment between a pair of sequences.

pdb2ss

pdb2ss assigns secondary structure for a protein or RNA molecule.

File Format

Apart from the standard Protein Data Bank (PDB) format coordinate file, TM-align also accepts SPICKER format Cα coordinate file, and its own xyz format file. Additionally, TMalign, se, pdb2xyz and pdb2fasta (but not xyz_sfetch) can directly read PDB, SPICKER or xyz files compressed in gz or bz2 format, assuming that zcat or bzcat, respectively, are installed in the user's system.

The xyz format file for TM-align is similar to the standard xyz format except that only Cα atom is included and that the first column is residue type instead of atom type. An example xyz file with two entries is shown below:

13
5w4kA
S   72.311  -51.185   -2.918
P   73.936  -53.176   -5.841
G   70.928  -52.645   -6.676
N   71.123  -49.079   -7.785
A   77.496  -48.353   -8.621
S   78.090  -52.061   -7.844
S   76.393  -58.239   -7.639
N   74.811  -59.905   -4.562
S   72.182  -57.623   -3.594
A   68.953  -56.708   -4.982
S   65.951  -58.235   -6.834
A   64.856  -59.057  -10.404
N   65.245  -56.660  -13.331
11
6b1tW
G   90.717   92.122  308.228
L   93.058   92.926  305.367
R   91.495   96.358  304.879
F   92.711   97.742  308.205
P   95.456  100.350  307.654
S   98.908   99.779  309.067
K  100.010  101.232  312.405
M  102.553  104.041  312.017
F  103.757  106.727  314.397
G  101.546  109.763  313.957
G   99.622  108.487  310.948

Here, 13 in the first line is chain length (the number of residues) and the second line is the name of entry (5w4kA). The next 13 lines are the Cα atom coordinates of all 13 residues: position 1 is amino acid type; position 3-10, 12-19, and 21-28 are the X, Y, and Z coordinates, respectively. The next entry (6b1tW with 11 residues) follows without empty line.

License

This is a re-implementation of TM-align algorithm in C/C++. The code was written by Jianyi Yang and later updated by Jianjie Wu, Sha Gong, and Chengxin Zhang at The Yang Zhang lab, Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218. Please report bugs and questions to yangzhanglab@umich.edu

DISCLAIMER: Permission to use, copy, modify, and distribute this program for any purpose, with or without fee, is hereby granted, provided that the notices on the head, the reference information, and this copyright notice appear in all copies or substantial portions of the Software. It is provided "as is" without express or implied warranty.

References

If you find TM-align useful in your research, please cite the first paper listed below. If you use the -byresi option for TM-score superposition without re-alignment, please cite the second paper instead. If you use the -fast option for fast TM-align calculation, please cite the third paper. If you use the program for RNA/DNA structure alignment, please cite the fourth paper/

  1. Yang Zhang and Jeffrey Skolnick. "TM-align: a protein structure alignment algorithm based on the TM-score." Nucleic acids research 33.7 (2005): 2302-2309.
  2. Zhang, Yang, and Jeffrey Skolnick. "Scoring function for automated assessment of protein structure template quality." Proteins: Structure, Function, and Bioinformatics 57.4 (2004): 702-710.
  3. Runze Dong, Shuo Pan, Zhenling Peng, Yang Zhang, and Jianyi Yang. "mTM-align: a server for fast protein structure database search and multiple protein structure alignment." Nucleic acids research (2018).
  4. Sha Gong, Chengxin Zhang, Yang Zhang, RNA-align: quick and accurate alignment of RNA 3D structures based on size-independent TM-scoreRNA. Bioinformatics (2019)