- 📋 Table of Contents
- 📚 About
- 🚀 Installation
- 🚀 Quick Start
- ⚙️ Operation Modes & Workflow
- 📖 Usage Guide
- 📚 Documentation
- 🤝 Contributing
- 📚 Publications & Citation
- 🙏 Acknowledgements
ColBuilder is a specialized tool for generating atomistic models of collagen microfibrils from single collagen molecules. Developed by the Gräter group at the Max Planck Institute for Polymer Research, it provides researchers with a flexible framework to create biologically relevant collagen structures for molecular dynamics simulations and structural studies.
- Custom microfibril generation: Create collagen microfibrils from individual molecules or amino acid sequences with precise control over structural parameters
- Highly configurable: Adjust collagen sequence, fibril geometry, crosslink types and density to match your custom conditions
- Simulation-ready output: Generate atomistic and coarse-grained topology files compatible with major molecular dynamics packages
- Reproducible research: Standardized approach to collagen modeling to ensure consistency across studies
- Python 3.9 or later
- Git
- Conda package manager (we recommend miniforge)
-
Create and activate a conda environment:
conda create -n colbuilder python=3.9 conda activate colbuilder
-
Clone the repository:
git clone git@github.com:graeter-group/colbuilder.git cd colbuilder
-
Install ColBuilder:
pip install .
ColBuilder requires several external tools to function properly:
conda install conda-forge::pymol-open-source
Note: If PyMOL fails due to missing libnetcdf.so
, install:
conda install -c conda-forge libnetcdf==4.7.3
conda install bioconda::muscle
- Download the latest version of UCSF Chimera (64-bit recommended)
- Make the binary executable and run the installer:
cd ~/Downloads # or wherever you downloaded the file chmod +x chimera*.bin ./chimera*.bin
- Follow the installation prompts, preferably creating a symlink in a directory in your
$PATH
Note: ColBuilder specifically requires UCSF Chimera, not the newer ChimeraX.
- Download Modeller version 10.5
- Follow the installation instructions provided
- Add the following environment variables to your
.bashrc
or.bash_profile
:(Adjust paths according to your installation location)export PYTHONPATH="/home/user/bin/modeller10.5/lib/x86_64-intel8/python3.3:$PYTHONPATH" export PYTHONPATH="/home/user/bin/modeller10.5/modlib:$PYTHONPATH" export LD_LIBRARY_PATH="/home/user/bin/modeller10.5/lib/x86_64-intel8:$LD_LIBRARY_PATH"
To verify your installation and run a basic example:
-
Verify installation:
colbuilder --help
-
Create a basic configuration file (save as
config.yaml
):# Basic human collagen microfibril configuration species: "homo_sapiens" sequence_generator: true geometry_generator: true crosslink: true fibril_length: 60.0 contact_distance: 20 n_term_type: "HLKNL" c_term_type: "HLKNL" n_term_combination: "9.C - 947.A" c_term_combination: "1047.C - 104.C"
-
Run ColBuilder:
colbuilder --config_file config.yaml
ColBuilder operates through modular modes, each responsible for a different part of the collagen model-building pipeline. These modes can be combined in various ways or run separately using different configuration files.
ColBuilder produces or requires two kinds of PDB files:
- Collagen triple helix molecule PDB: a single ~334 nm-long collagen molecule (usually with specified crosslink residues). Output of Mode 1, input to Modes 2 and 4.
- Collagen fibril PDB: a full microfibril model composed of multiple triple helices arranged based on crystal geometry, length, and crosslinking. Output of Modes 2, 4, or 5, input to Modes 3 and 5.
Understanding this distinction is crucial for organizing your workflow correctly.
# | Mode | Purpose | Input(s) | Output | Can Run With Other Modes? |
---|---|---|---|---|---|
1 | sequence_generator |
Generate a collagen triple helix molecule via homology modeling | species or custom FASTA |
Triple helix PDB | Yes: with 2, 3, 5 |
2 | geometry_generator |
Assemble a collagen fibril from a single triple helix | PDB from Mode 1 or custom PDB | Fibril PDB | Yes: with 1, 3, 5 |
3 | topology_generator |
Generate topology files for GROMACS simulations | Fibril PDB (from Mode 2, 4, or 5) | .top , .itp , .gro |
Yes: with 2, 4, 5 |
4 | mix_bool |
Generate a fibril by mixing two crosslink types | Two triple helix PDBs from Mode 1 | Mixed fibril PDB | No, requires separate script |
5 | replace_bool |
Replace crosslinks in an existing fibril | Fibril PDB from Mode 2 or 4 | Modified fibril PDB | Yes: with 2, 3 |
These combinations can be run in a single config file:
# Example combination
sequence_generator: true
geometry_generator: true
topology_generator: true # (optional)
replace_bool: true # (optional)
These mode combinations can be run in a single configuration file:
- ✅
1 + 2
- ✅
1 + 2 + 3
- example - ✅
2 + 3
(starting from a custom triple helix PDB) - ✅
1 + 2 + 5 + 3
- ✅
1 + 2 + 5
- ✅
2 + 5
- example - ✅
2 + 5 + 3
Mixing crosslinks (Mode 4) currently requires a separate workflow using two config files for triple helix generation and one for fibril construction:
# Example bash script for mixing crosslinks
colbuilder --config_file triple_helix_A.yaml
colbuilder --config_file triple_helix_B.yaml
colbuilder --config_file mix_geometry.yaml # sets mix_bool: true and includes both PDBs
You can also chain this with replace_bool (Mode 5) or topology_generator (Mode 3) in the third config.
The general syntax for running ColBuilder is:
colbuilder --config_file config.yaml [OPTIONS]
ColBuilder uses YAML configuration files to define parameters. Here's a complete template with all available options:
# Operation Mode
mode: null # Specific operation mode if needed
config_file: null # Path to another config file (for nested configs)
sequence_generator: true # Generate sequence from species
geometry_generator: true # Generate fibril geometry
topology_generator: false # Generate topology files
debug: false # Enable debug mode
# Input Configuration
species: "homo_sapiens" # Species for collagen sequence
# Available species options:
# Mammals (Primates): homo_sapiens, pan_troglodytes, pongo_abelii, callithrix_jacchus, otolemur_garnettii
# Mammals (Rodents): mus_musculus, rattus_norvegicus
# Mammals (Other): bos_taurus, canis_lupus, ailuropoda_melanoleuca, mustela_putorius, myotis_lucifugus, loxodonta_africana
# Fish: danio_rerio, oreochromis_niloticus, oryzias_latipes, tetraodon_nigroviridis, xiphophorus_maculatus
# Reptiles: pelodiscus_sinensis
# Sequence Settings
fasta_file: null # Custom FASTA file path (if null, auto-generated based on species)
crosslink: true # Enable crosslinking in the model
# Check available crosslinks and respective combinations at [src/colbuilder/data/sequence/crosslinks.csv](https://github.com/graeter-group/colbuilder/blob/main/src/colbuilder/data/sequence/crosslinks.csv)
n_term_type: "HLKNL" # N-terminal crosslink type (Options: "DPD", "DPL", "HLKNL", "LKNL", "PYD", "PYL", "deHHLNL", "deHLNL", "NONE")
c_term_type: "HLKNL" # C-terminal crosslink type (Options: "DPD", "DPL", "HLKNL", "LKNL", "PYD", "PYL", "deHHLNL", "deHLNL", "NONE")
n_term_combination: "9.C - 947.A" # N-terminal residue combination
c_term_combination: "1047.C - 104.C" # C-terminal residue combination
# Geometry Parameters
pdb_file: null # Input PDB file (set to null if sequence_generator is true)
contact_distance: 20 # Distance threshold for contacts (Å)
fibril_length: 70.0 # Length of the generated fibril (nm)
crystalcontacts_file: null # File with crystal contacts
connect_file: null # File with connection information
crystalcontacts_optimize: false # Optimize crystal contacts during generation
# Mixing Options (for mixed crosslinked microfibril)
mix_bool: false # Enable mixing of different crosslink types
ratio_mix: "A:70 B:30" # Format: "Type:percentage Type:percentage"
files_mix: # Required if mix_bool is true
- "collagen-molecule-crosslinkA.pdb" # PDB file of collagen molecule with type A crosslinks (created by only setting squence and crosslinks = true (please look at the examples))
- "collagen-molecule-crosslinkB.pdb" # PDB file of collagen molecule with type B crosslinks
# Replacement Options (for fewer crosslinks)
replace_bool: false # Enable crosslink replacement
ratio_replace: 30 # Percentage of crosslinks to replace
replace_file: null # File with crosslinks to be replaced (set to null if geometry_generation is true)
# Topology Options
force_field: "amber99" # Force field for topology generation (Options: "amber99", "martini3")
For a complete list of configuration options, see the detailed documentation.
# config_human_basic.yaml
species: "homo_sapiens"
sequence_generator: true
geometry_generator: true
crosslink: false
fibril_length: 40.0
contact_distance: 25
colbuilder --config_file config_human_basic.yaml
# config_bovine_crosslinked.yaml
species: "bos_taurus"
sequence_generator: true
geometry_generator: true
crosslink: true
n_term_type: "HLKNL"
c_term_type: "HLKNL"
n_term_combination: "9.C - 946.A"
c_term_combination: "1046.C - 103.C"
fibril_length: 80.0
contact_distance: 15
colbuilder --config_file config_bovine_crosslinked.yaml
Creating a Mixed Crosslinked (80% Divalent + 20% Trivalent) Human Collagen Microfibril from Collagen Molecules
# config_mixed_crosslinks.yaml
species: "homo_sapiens"
sequence_generator: false
geometry_generator: false
mix_bool: true
ratio_mix: "D:80 T:20"
files_mix:
- "human-D.pdb"
- "human-T.pdb"
colbuilder --config_file config_mixed_crosslinks.yaml
# config_topology.yaml
species: "homo_sapiens"
sequence_generator: false`
geometry_generator: true
topology_generator: true
pdb_file: "path/to/template_collagen_molecule.pdb"
force_field: "martini3"
colbuilder --config_file config_topology.yaml
For detailed API documentation, advanced usage examples, and theoretical background:
We welcome contributions to ColBuilder! Please see our contributing guidelines for details on how to submit issues, pull requests, and code reviews.
If you use ColBuilder in your research, please cite our paper:
https://www.biorxiv.org/content/10.1101/2024.12.10.627782v1
A BibTeX entry is provided in the CITATION.cff file.
ColBuilder is developed and maintained by the Gräter group at the Max Planck Institute for Polymer Research. We thank all contributors that have supported this work.
For questions, feedback, or support, please open an issue on our GitHub repository.