This toolbox requires Biopython and msms packages (msms is required only for computing structure related properties, it is used by Bio.PDB.ResidueDepth module). You can install them with conda. Alternatively, install Biopython using pip and install msms manually.
create an environment
conda create --name toolbox
activate it
conda activate toolbox
add bioconda channel
conda config --add channels bioconda
install necessary packages
conda install biopython msms
create python virtual environment
python -m venv toolbox
activate it
source bioinf/bin/activate
install Biopython
pip install msms biopython
Download msms tool from: http://mgltools.scripps.edu/downloads#msms based on your OS. After the installation, make sure that msms is in your PATH variable.
git clone "https://github.com/CalounovaT/Bioinfo_toolbox/"
run the main program
./main.py
Obtain description of an entry given its index in a file (0 in the example below)
./main.py fasta --description 0 proteins.fasta
Obtain sequence of an entry given its index in a file (0 in the example below)
./main.py fasta --sequence 0 proteins.fasta
Return sequence length given its index in a file (0 in the example below)
./main.py fasta --length 0 proteins.fasta
Return subsequence of a sequence given its index in a file, subsequence start and end positions
./main.py fasta --subsequence 0 1 6 proteins.fasta
Measure Hamming distance of two sequences
./main.py hamming 'ABC' 'ACC'
Measure the edit distance of two sequences
./main.py edit 'ABC' 'AAC'
Output their alignments
./main.py edit --alignment 'ABC' 'AAC'
Obtain information about the stored structure (number of models, structures, residues, atoms).
./main.py pdb --information 1B0B.pdb
Comptue the width of the structure (maximum of distance of any two atoms).
./main.py pdb --width 1B0B.pdb
Obtain list of atoms being in given distance from given ligand (HETATM). - give ligand and distance as arguments
./main.py pdb --atoms SAC 10 1B0B.pdb
Obtain list of residues being in given distance from given ligand (HETATM).
./main.py pdb --residues SAC 10 1B0B.pdb
Retrieve sequence by its position
./main.py msa --sequencepos 0 align.msa
Retrieve sequence by its id
./main.py msa --sequenceid "UniRef90_UPI000" align.msa
Retrieve given column from the MSA
./main.py msa --column 0 align.msa
Retrieve sum of pairs score of a column
./main.py msa --spcolumn 0 align.msa
Retrieve sum of pairs score of a msa
./main.py msa --spmsa align.msa
Compute conservation of all positions in the msa
./main.py conservation --conservation align.msa
identify top N scoring positions in the msa
./main.py conservation --toppositions 5 align.msa
Compute the diameter of the protein.
./main.py properties --diameter 1B0B.pdb
Compute the ratio of surface and buried amino acids.
./main.py properties --ratio 1B0B.pdb
Output data for a histogram of amino acids composition of buried and exposed amino acids.
./main.py properties --surfacecounts 1B0B.pdb
./main.py properties --buriedcounts 1B0B.pdb
Quantify portion of polar amino acids in the core and on the surface of the protein.
./main.py properties --surfacepolar 1B0B.pdb
./main.py properties --buriedpolar 1B0B.pdb