PyPDBcomplex is a comprehensive Python toolkit for analyzing protein structures, with a focus on antibody-antigen complexes. It provides easy-to-use functions for extracting detailed structural features, analyzing molecular interactions, and comparing multiple structures.
- PDB Parsing: Robust parsing with support for altloc, HETATM, and complex structures
- Selection System: Intuitive syntax for selecting chains, residues, and atoms
- Contact Analysis: Detect H-bonds, salt bridges, hydrophobic contacts, pi-stacking, disulfides
- Interface Analysis: Identify interface residues and calculate buried surface area
- SASA Calculation: Solvent accessible surface area with bound/unbound comparison
- Geometry Analysis: Backbone angles (Ο, Ο, Ο), sidechain rotamers (Ο), Ramachandran plots
- VdW Energy: Lennard-Jones energy decomposition
- Distance Analysis: Pairwise distances and distance matrices
- Residue Features: Comprehensive feature extraction combining all analyses
- Single Chain Analysis: Detailed characterization of individual protein chains
- Multi-Complex Comparison: Side-by-side comparison of multiple structures
- Hotspot Identification: Automatic identification of critical binding residues
- Interactive Visualizations: HTML dashboards with plotly
# Clone or download the repository
git clone https://github.com/fbabd/PyPDBcomplex.git
cd PyPDBcomplex
# Install in development mode
pip install -e .
# Or install with all optional dependencies
pip install -e ".[all]"Core dependencies (automatically installed):
- numpy >= 1.20.0
- pandas >= 1.3.0
- matplotlib >= 3.4.0
Optional dependencies:
- freesasa >= 2.1.0 (for fast SASA calculations)
- plotly >= 5.0.0 (for interactive visualizations)
- seaborn >= 0.11.0 (for statistical plots)
from PyPDBcomplex.pdbparser import parse_pdb
from PyPDBcomplex.interface import compute_interface
from PyPDBcomplex.contacts import analyze_contacts
# Parse PDB file
cx = parse_pdb("protein.pdb")
# Analyze interface between chains
interface = compute_interface(
cx,
selection_A=["H", "L"], # Antibody
selection_B=["A"], # Antigen
cutoff=5.0
)
print(f"Interface residues: {interface.total_contacts}")
print(f"Buried surface area: {interface.bsa_total:.2f} Ε²")
# Detect molecular interactions
contacts = analyze_contacts(cx, ["H", "L"], ["A"])
print(f"H-bonds: {contacts.get_contact_counts()['hydrogen_bond']}")from PyPDBcomplex.residue_features import (
extract_residue_features,
features_to_dataframe,
identify_hotspots
)
# Extract all features
features = extract_residue_features(
cx,
selection_A=["H", "L"],
selection_B=["A"],
compute_sasa=True,
compute_geometry=True,
compute_vdw=True
)
# Convert to DataFrame
df = features_to_dataframe(features)
# Find hotspots
hotspots = identify_hotspots(features, min_interactions=3)
print(f"Identified {len(hotspots)} hotspot residues")from PyPDBcomplex.multicomplex.multi_analysis import (
MultiComplexAnalyzer,
AnalysisConfig
)
# Load multiple structures
analyzer = MultiComplexAnalyzer([
("Wild-Type", "wt.pdb"),
("Mutant", "mutant.pdb")
])
# Configure and run analysis
config = AnalysisConfig(
calculate_sasa=True,
calculate_contacts=True,
calculate_interface=True,
calculate_vdw=True
)
results = analyzer.run_analysis(config)
# Compare results
for name, data in results.items():
print(f"{name}: {data.summary}")The examples/ directory contains comprehensive Jupyter notebooks demonstrating all features:
- nb1-pdb_file_processing.ipynb - PDB parsing and navigation
- nb2-contact_analysis.ipynb - Molecular interactions
- nb3-interface_analysis.ipynb - Interface characterization
- nb4-geometry_analysis.ipynb - Structural geometry
- nb5-distance_analysis.ipynb - Distance calculations
- nb6-sasa_analysis.ipynb - Surface accessibility
- nb7-vdw_analysis.ipynb - Van der Waals energies
- nb8-residue_features.ipynb - Comprehensive feature extraction
- nb9-single_chain_features.ipynb - Single protein analysis
- nb10-multi_complex_comparison.ipynb - Multi-structure comparison
pdbparser: Parse PDB files β Complex objectsselection: Select specific atoms, residues, chainscontacts: Detect molecular interactionsinterface: Analyze protein-protein interfacessasa: Calculate solvent accessibilitygeometry: Compute structural anglesdistances: Calculate distances and distance matricesvdw: Van der Waals energy calculations
residue_features: Unified feature extraction for complexessingle_chain_features: Feature extraction for single chainsmulticomplex: Compare multiple structures
visualization.contacts_viz: Contact network plotsvisualization.interface_viz: Interface heatmapsvisualization.geometry_viz: Ramachandran plotsvisualization.residue_feat_viz: Interactive HTML dashboards
- Antibody Engineering: Analyze and optimize antibody-antigen binding
- Mutation Effect Prediction: Compare wild-type vs mutants
- Drug Discovery: Identify binding hotspots and druggable sites
- Protein Design: Evaluate designed protein interfaces
- Structure Quality Assessment: Validate protein structures
- Machine Learning: Generate feature matrices for ML models
Contributions are welcome! Please feel free to submit a Pull Request.
If you use PyPDBcomplex in your research, please cite:
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for the structural biology and computational chemistry communities
- Documentation: See
examples/for comprehensive tutorials - Issues: Report bugs at https://github.com/fbabd/PyPDBcomplex/issues
- Questions: Open a discussion on GitHub
Made with β€οΈ for protein structure analysis