Welcome to the Protein<>DNA Interface Generation repository! 🚀
- Introduction
- Repository Structure
- Workflow Stages
- Installation and Dependencies
- Usage
- Docker Usage
- Testing
- Contributing
- License
This project offers a pipeline to:
- Process multi-chain PDB files placed in the input/ folder.
- Split them into chain-specific files in split_chain/.
- Use Naccess to generate
.asa,.rsa, and.intoutputs for both the entire complex and individual chains in rsa/. - Produce final residue propensity maps and other interface analysis results in CSV format under interface/.
It leverages:
- Python: Data parsing and scripting tasks.
- Fortran: Performance-heavy computations.
- Shell: Workflow automation.
- Docker: Reproducible and consistent environment.
- Snakemake: Automated workflow orchestration.
Protein_DNA_Interface_Generation/
├── input/ # Raw PDB or other input files
├── split_chain/ # Contains split chain PDB files
├── rsa/ # Naccess outputs (.asa, .rsa, .int) for complex & chains
├── interface/ # Final residue propensity maps (CSV) & summary outputs
├── scripts/ # Python, Shell, Fortran scripts
├── docker/ # Docker configuration and resources
├── Snakefile # Main Snakemake workflow definition
└── README.md # Project documentation (this file)
- input/: Place your raw PDB files here to be processed.
- split_chain/: Contains the individual chain-specific PDB files generated by the workflow.
- rsa/: Holds the output from Naccess (e.g.,
.asa,.rsa,.int) run on both the entire complex and individual chains. - interface/: Stores final CSV results with residue-based interface metrics and any summary files.
- scripts/: Key scripts for chain splitting, interface analysis, and more.
- docker/: Docker setup to help package and run the entire pipeline in a container.
- Input Parsing
- Reads
.pdbfiles frominput/.
- Reads
- Chain Splitting
- Splits each file by chain, outputting them to
split_chain/.
- Splits each file by chain, outputting them to
- Naccess Runs
- Computes accessible surface areas for both the complex and each chain, results go to
rsa/.
- Computes accessible surface areas for both the complex and each chain, results go to
- Interface Computation
- Uses Naccess outputs to identify interface residues and compute relevant metrics.
- Results Aggregation
- Final CSV files summarizing residue-based interface stats are written into
interface/.
- Final CSV files summarizing residue-based interface stats are written into
-
Clone the Repository
git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git cd Protein_DNA_Interface_Generation -
Install Dependencies
- Python (3.7+ recommended)
- Snakemake (install via
piporconda):pip install snakemake
- Fortran Compiler (e.g., gfortran)
- Shell (usually installed by default)
- Docker (optional, but recommended for reproducible runs)
-
Check Installation
snakemake --version
A valid Snakemake version (e.g., 7.x) should be displayed.
-
Prepare Input
- Place your raw
.pdbfiles ininput/.
- Place your raw
-
Run the Workflow
snakemake --cores 1 --latency-wait 10
This commands all the steps: splitting PDB files, running Naccess, and generating interface results.
-
Customization (Optional)
- Modify or add rules in the
Snakefile. - Update any scripts in
scripts/to customize the pipeline.
- Modify or add rules in the
-
Dry Run
snakemake --cores 1 --latency-wait 10
Shows the planned jobs without executing them.
-
Force All Steps
snakemake --cores 1 --latency-wait 10
Re-runs every rule ignoring cached results.
-
Workflow DAG
snakemake --dag | dot -Tpng > dag.png
Exports a directed acyclic graph (DAG) of the workflow.
We provide a ready-to-use Docker image to facilitate a reproducible environment. Below are instructions to pull or build the image, and run the pipeline inside a container.
docker pull mhtjsh/protein-dna-interfacegit clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
cd Protein_DNA_Interface_Generation
docker build -t mhtjsh/protein-dna-interface .docker run --rm -it mhtjsh/protein-dna-interfaceTo process your own input PDB files and retrieve outputs on your host system:
docker run --rm -it \
-v /home/mhtjsh/Protein_DNA_Interface_Generation/input:/app/input \
-v /home/mhtjsh/Protein_DNA_Interface_Generation/output:/app/output \
mhtjsh/protein-dna-interface- Ensure your local
input/directory has.pdbfiles before running. - Results will be placed in
output/. - Adjust the local path (
/home/mhtjsh/Protein_DNA_Interface_Generation) to your actual directory if needed.
-
Manual Testing
- Place a test PDB file in
input/. - Run Snakemake or the Docker container, verifying outputs in
split_chain/,rsa/, andinterface/.
- Place a test PDB file in
-
Automated Testing
- Create minimal test data and a test rule in the
Snakefileor a CI configuration (e.g., GitHub Actions).
- Create minimal test data and a test rule in the
All contributions are welcome! To contribute:
- Fork this repository.
- Create a new feature branch.
- Submit a pull request with your changes.
This project is distributed under an open-source license (e.g., MIT). See LICENSE for details.