PDB-CAT: Classification and Analysis Tool

PDB-CAT is a Jupyter Notebook that aims to automatically categorize the PDB structures based on the type of interaction between atoms in the protein and the ligand, and checking for any mutations in the sequence.

PDB-CAT is a program that classifies a group of protein structures based on their ligands into three categories: apo, covalently, and non-covalently bonded. Besides this classification, the program can verify if there are any mutations in the protein sequence by comparing it to a reference sequence. PDB-CAT is designed to be user-friendly, with its output clearly defining every entity present in each entry to facilitate decision-making.

Documentation

Installation

Python3.10 or higher is required. Install required packages using:

  pip install -r requirements.txt

Quickstart

Option 1. Clone

  git clone https://github.com/URV-cheminformatics/PDB-CAT.git

Option 2. Download

Click on the green button labeled '<>Code' in the top right corner
Select 'Download zip'

You now have a copy of the repository's files saved as a zip file on your local computer. You can edit and customize the files for your own purposes.

For more information:

Downloading files from GitHub webpage

Dataset

In order to create the Dataset there are two options:

Search your protein target in Protein Data Bank and download the PDBx/mmCIF files in batches

Downloading files from PDB webpage

Download the structures of known IDs with the following executable:

batch-download script

-f specify input file with id separate by comma

-c for cif-gz file

-o specifiy output path

  ./batch_download.sh -f input.txt -c -o /output # execute to download by ID names

(optional)

gunzip *.gz # Decompress downloaded .gz

Note: The dataset must be in the /cif directory before executing the program.

Variables

To run this project, you will need to add the following variables to your main code

 # Name of the folder with the cif files to process
folder_name = "Main-protease-cif" 
# Chose a threshold for the number of amino acids, to discriminate between peptides and the subunits of the protein                                               
res_threshold = 20  
# Analyze mutations. True or False        
mutation = True      
# PDB code of the protein to analyze. If mutation is False, this variable is not used.                           
pdb = "rcsb_pdb_SARSCoV2"

Choose to use mutation filter

Mutation = True

Example SARS-CoV-2 variants

Mutation = False

Example PDBBind

Blacklist

The blacklist compiles more than 280 codes for solvents, ions, co-factors, and other substances capable of bonding with the protein structure. This information is stored in a text file that users can edit, allowing for the inclusion of new codes or adjustments related to the significance of co-factors and solvents in the analysis.

Usage/Examples

Main Protease SARS-CoV-2 Example

To demonstrate the use of the PDB-CAT program, we analyzed 1,436 PDB structures containing the SARS-CoV-2 main protease (M-pro). The PDBx/mmCIF files underwent thorough analysis and mutation categorization. The FASTA file included the sequence of the first crystallized M-pro structure (PDB ID: 6LU7) as well as the M-pro sequence from the Omicron variant (B.1.1.529).

Users can access the CSV output in the "Example" folder.

Extra

Best Poster Award at the Strasbourg Summer School in Cheminformatics 2024

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.ipynb_checkpoints		.ipynb_checkpoints
example		example
image_documentation		image_documentation
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
PDB-CAT-colab.ipynb		PDB-CAT-colab.ipynb
PDB-CAT.ipynb		PDB-CAT.ipynb
PDBCAT_module.py		PDBCAT_module.py
README.md		README.md
batch_download.sh		batch_download.sh
blacklist.txt		blacklist.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDB-CAT: Classification and Analysis Tool

Documentation

Installation

Quickstart

Dataset

Variables

Choose to use mutation filter

Blacklist

Usage/Examples

Extra

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

URV-cheminformatics/PDB-CAT

Folders and files

Latest commit

History

Repository files navigation

PDB-CAT: Classification and Analysis Tool

Documentation

Installation

Quickstart

Dataset

Variables

Choose to use mutation filter

Blacklist

Usage/Examples

Extra

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages