A Comprehensive Protein-DNA Interface Generation Tool with Residue Propensity Map analysis

Welcome to the Protein<>DNA Interface Generation repository! 🚀

Introduction

This project offers a pipeline to:

Process multi-chain PDB files placed in the input/ folder.
Split them into chain-specific files in split_chain/.
Use Naccess to generate .asa, .rsa, and .int outputs for both the entire complex and individual chains in rsa/.
Produce final residue propensity maps and other interface analysis results in CSV format under interface/.

It leverages:

Python: Data parsing and scripting tasks.
Fortran: Performance-heavy computations.
Shell: Workflow automation.
Docker: Reproducible and consistent environment.
Snakemake: Automated workflow orchestration.

Repository Structure

Protein_DNA_Interface_Generation/
├── input/                  # Raw PDB or other input files
├── split_chain/            # Contains split chain PDB files
├── rsa/                    # Naccess outputs (.asa, .rsa, .int) for complex & chains
├── interface/              # Final residue propensity maps (CSV) & summary outputs
├── scripts/                # Python, Shell, Fortran scripts
├── docker/                 # Docker configuration and resources
├── Snakefile               # Main Snakemake workflow definition
└── README.md               # Project documentation (this file)

input/: Place your raw PDB files here to be processed.
split_chain/: Contains the individual chain-specific PDB files generated by the workflow.
rsa/: Holds the output from Naccess (e.g., .asa, .rsa, .int) run on both the entire complex and individual chains.
interface/: Stores final CSV results with residue-based interface metrics and any summary files.
scripts/: Key scripts for chain splitting, interface analysis, and more.
docker/: Docker setup to help package and run the entire pipeline in a container.

Workflow Stages

Input Parsing
- Reads .pdb files from input/.
Chain Splitting
- Splits each file by chain, outputting them to split_chain/.
Naccess Runs
- Computes accessible surface areas for both the complex and each chain, results go to rsa/.
Interface Computation
- Uses Naccess outputs to identify interface residues and compute relevant metrics.
Results Aggregation
- Final CSV files summarizing residue-based interface stats are written into interface/.

Installation and Dependencies

Clone the Repository

git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
cd Protein_DNA_Interface_Generation

Install Dependencies
- Python (3.7+ recommended)
- Snakemake (install via pip or conda):
```
pip install snakemake
```
- Fortran Compiler (e.g., gfortran)
- Shell (usually installed by default)
- Docker (optional, but recommended for reproducible runs)
Check Installation
```
snakemake --version
```
A valid Snakemake version (e.g., 7.x) should be displayed.

Usage

Prepare Input
- Place your raw .pdb files in input/.
Run the Workflow
```
snakemake --cores 1 --latency-wait 10
```
This commands all the steps: splitting PDB files, running Naccess, and generating interface results.
Customization (Optional)
- Modify or add rules in the Snakefile.
- Update any scripts in scripts/ to customize the pipeline.

Common Snakemake Options

Dry Run
```
snakemake --cores 1 --latency-wait 10
```
Shows the planned jobs without executing them.
Force All Steps
```
snakemake --cores 1 --latency-wait 10
```
Re-runs every rule ignoring cached results.
Workflow DAG
```
snakemake --dag | dot -Tpng > dag.png
```
Exports a directed acyclic graph (DAG) of the workflow.

Docker Usage

We provide a ready-to-use Docker image to facilitate a reproducible environment. Below are instructions to pull or build the image, and run the pipeline inside a container.

Pull the Pre-built Image (Recommended)

docker pull mhtjsh/protein-dna-interface

Build the Image Yourself (Optional)

git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
cd Protein_DNA_Interface_Generation
docker build -t mhtjsh/protein-dna-interface .

Running the Container

Basic Run Using Example Data

docker run --rm -it mhtjsh/protein-dna-interface

Mounting Input and Output Folders

To process your own input PDB files and retrieve outputs on your host system:

docker run --rm -it \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/input:/app/input \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/output:/app/output \
  mhtjsh/protein-dna-interface

Ensure your local input/ directory has .pdb files before running.
Results will be placed in output/.
Adjust the local path (/home/mhtjsh/Protein_DNA_Interface_Generation) to your actual directory if needed.

Testing

Manual Testing
- Place a test PDB file in input/.
- Run Snakemake or the Docker container, verifying outputs in split_chain/, rsa/, and interface/.
Automated Testing
- Create minimal test data and a test rule in the Snakefile or a CI configuration (e.g., GitHub Actions).

Contributing

All contributions are welcome! To contribute:

Fork this repository.
Create a new feature branch.
Submit a pull request with your changes.

License

This project is distributed under an open-source license (e.g., MIT). See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
input		input
interface		interface
naccess		naccess
rsa		rsa
scripts		scripts
split_chains		split_chains
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
input_directions.txt		input_directions.txt
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Comprehensive Protein-DNA Interface Generation Tool with Residue Propensity Map analysis

Table of Contents

Introduction

Repository Structure

Workflow Stages

Installation and Dependencies

Usage

Common Snakemake Options

Docker Usage

Pull the Pre-built Image (Recommended)

Build the Image Yourself (Optional)

Running the Container

Basic Run Using Example Data

Mounting Input and Output Folders

Testing

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

mhtjsh/ProteinDNAInterfaceAnalysis

Folders and files

Latest commit

History

Repository files navigation

A Comprehensive Protein-DNA Interface Generation Tool with Residue Propensity Map analysis

Table of Contents

Introduction

Repository Structure

Workflow Stages

Installation and Dependencies

Usage

Common Snakemake Options

Docker Usage

Pull the Pre-built Image (Recommended)

Build the Image Yourself (Optional)

Running the Container

Basic Run Using Example Data

Mounting Input and Output Folders

Testing

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages