This repository contains code and examples for the article: “Rapid Generation of Rare Event Pathways Using Direction-Guided Adaptive Sampling: From Ligand Unbinding to Protein (Un)Folding.” 📄 DOI: https://doi.org/10.1021/acs.jctc.5c01244
PathGennie is a general-purpose steering framework for guiding molecular simulations along data-driven(or physical) collective variables (CVs) to rapidly sample rare event transitions such as:
- Ligand unbinding
- Protein folding and unfolding
It leverages high-performance libraries like OpenMM and MDAnalysis, and includes tooling for CV construction, adaptive sampling, and optimized trajectory generation.
pathgennie/
│
├── README.md # Documentation and usage guide
├── LICENSE # MIT or compatible license
├── environment.yml # Conda environment for reproducibility
│
├── Scripts/ # Scripts for various path generation tasks
│ ├── unbind # Ligand unbinding module
│ ├── unfold # Protein unfolding module
│ └── fold # Protein folding / reverse folding module
│
├── examples/ # example systems
│ ├── 3PTB/ # Example: Bovine Trypsin Inhibitor
│ │ ├── native.pdb
│ │ ├── start.gro
│ │ ├── topol.top
│ │ └── system.py
│ │
│ └── 2JOF/ # Example: Trp-cage protein system
│ ├── native.pdb
│ ├── start.gro
│ ├── topol.top
│ └── system.py
Clone the repository and set up the Conda environment:
# Clone the repository
git clone https://github.com/dmighty007/PathGennie.git
# Navigate to the project folder
cd PathGennie
# Create and activate the environment
conda env create -f environment.yml
conda activate pathgennieThis installs all required dependencies, including:
openmmmdanalysisnumpy,numba,tqdm, and more
✅ Note: Ensure Miniconda or Anaconda is installed before proceeding.
This framework enables ligand unbinding simulations by steering MD along principal components derived from distance features between a ligand and its binding site. These components form a low-dimensional CV space for guided sampling.
-
Create a working directory:
mkdir Test && cd Test
-
Add your input files:
pbcmol.gro # Structure with solvent and PBC topol.top # GROMACS-compatible topology -
Perform energy minimization and equilibration using GROMACS or OpenMM. These outputs will be used as initial configurations.
Generate a PCA model based on ligand–protein contacts:
python pcagen.py pbcmol.gro \
--ligand_sel "resname LIG" \
--output pca.pklThis script:
- Computes distance features between the ligand and surrounding protein atoms
- Performs PCA on the feature matrix
- Stores the principal components in
pca.pkl
Create a system.py file to build the OpenMM simulation system:
from openmm.app import *
from openmm import *
from openmm.unit import *
class Simulation_obj:
def __init__(self):
gro = GromacsGroFile('pbcmol.gro')
top = GromacsTopFile('topol.top',
periodicBoxVectors=gro.getPeriodicBoxVectors(),
includeDir='/usr/local/gromacs/share/gromacs/top')
system = top.createSystem(nonbondedMethod=PME,
nonbondedCutoff=1*nanometer,
constraints=HBonds)
integrator = LangevinMiddleIntegrator(300*kelvin, 1/picosecond, 0.004*picoseconds)
self.simulation = Simulation(top.topology, system, integrator)📘 Refer to the OpenMM documentation for advanced setup options.
Run the unbinding driver with:
../../Scripts/unbind \
--structure_file pbcmol.gro \
--verbose \
--relax1 10 \
--relax2 15 \
--max_probes 50 \
--temperature 300 \
--model_file pca.pklOutput:
trajectory.xtc: Reactive trajectory file showing unbinding progression
| Parameter | Description | Default |
|---|---|---|
--ligand_name |
Ligand residue name | LIG |
--selection_radius |
Distance (Å) to include nearby protein atoms | 20.0 |
--relax1 |
MD steps for trial probe | 10 |
--relax2 |
MD steps for relaxation after acceptance | 15 |
--max_probes |
Number of parallel probes per cycle | 50 |
--temperature |
Temperature in Kelvin | 290 |
--no_save |
If set, disables output trajectory | False |
💡 For full options, run:
../../Scripts/unbind -hThe same logic applies to protein folding/unfolding simulations.
../../Scripts/unfold \
--ref_config folded.gro \
--start_config folded.gro \
--verbose \
--relax1 10 \
--relax2 15 \
--max_probes 50 \
--temperature 290../../Scripts/fold \
--ref_config folded.gro \
--start_config equili.gro \
--verbose \
--relax1 10 \
--relax2 15 \
--max_probes 50 \
--temperature 290📌 Make sure
system.pyis present in the working directory for these commands.
The examples/ directory includes two protein systems with:
- Native PDB structure
- Starting
groconfiguration - GROMACS topology (
topol.top) - Compatible
system.py
You can directly run folding/unfolding or unbinding simulations on these examples.
This project is licensed under the MIT License.
Feel free to open an issue or submit a pull request!