Skip to content

Toolkit for analyzing protein-DNA interfaces using computational methods. Includes Fortran, Python, and shell scripts for processing structural data, calculating interface properties, and visualizing results. Designed for bioinformatics and structural biology research focused on protein-DNA interactions.

License

Notifications You must be signed in to change notification settings

mhtjsh/ProteinDNAInterfaceAnalysis

Repository files navigation

A Comprehensive Protein-DNA Interface Generation Tool with Residue Propensity Map analysis

Welcome to the Protein<>DNA Interface Generation repository! 🚀


Table of Contents

  1. Introduction
  2. Repository Structure
  3. Workflow Stages
  4. Installation and Dependencies
  5. Usage
  6. Docker Usage
  7. Testing
  8. Contributing
  9. License

Introduction

This project offers a pipeline to:

  • Process multi-chain PDB files placed in the input/ folder.
  • Split them into chain-specific files in split_chain/.
  • Use Naccess to generate .asa, .rsa, and .int outputs for both the entire complex and individual chains in rsa/.
  • Produce final residue propensity maps and other interface analysis results in CSV format under interface/.

It leverages:

  • Python: Data parsing and scripting tasks.
  • Fortran: Performance-heavy computations.
  • Shell: Workflow automation.
  • Docker: Reproducible and consistent environment.
  • Snakemake: Automated workflow orchestration.

Repository Structure

Protein_DNA_Interface_Generation/
├── input/                  # Raw PDB or other input files
├── split_chain/            # Contains split chain PDB files
├── rsa/                    # Naccess outputs (.asa, .rsa, .int) for complex & chains
├── interface/              # Final residue propensity maps (CSV) & summary outputs
├── scripts/                # Python, Shell, Fortran scripts
├── docker/                 # Docker configuration and resources
├── Snakefile               # Main Snakemake workflow definition
└── README.md               # Project documentation (this file)
  • input/: Place your raw PDB files here to be processed.
  • split_chain/: Contains the individual chain-specific PDB files generated by the workflow.
  • rsa/: Holds the output from Naccess (e.g., .asa, .rsa, .int) run on both the entire complex and individual chains.
  • interface/: Stores final CSV results with residue-based interface metrics and any summary files.
  • scripts/: Key scripts for chain splitting, interface analysis, and more.
  • docker/: Docker setup to help package and run the entire pipeline in a container.

Workflow Stages

  1. Input Parsing
    • Reads .pdb files from input/.
  2. Chain Splitting
    • Splits each file by chain, outputting them to split_chain/.
  3. Naccess Runs
    • Computes accessible surface areas for both the complex and each chain, results go to rsa/.
  4. Interface Computation
    • Uses Naccess outputs to identify interface residues and compute relevant metrics.
  5. Results Aggregation
    • Final CSV files summarizing residue-based interface stats are written into interface/.

Installation and Dependencies

  1. Clone the Repository

    git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
    cd Protein_DNA_Interface_Generation
  2. Install Dependencies

    • Python (3.7+ recommended)
    • Snakemake (install via pip or conda):
      pip install snakemake
    • Fortran Compiler (e.g., gfortran)
    • Shell (usually installed by default)
    • Docker (optional, but recommended for reproducible runs)
  3. Check Installation

    snakemake --version

    A valid Snakemake version (e.g., 7.x) should be displayed.


Usage

  1. Prepare Input

    • Place your raw .pdb files in input/.
  2. Run the Workflow

    snakemake --cores 1 --latency-wait 10

    This commands all the steps: splitting PDB files, running Naccess, and generating interface results.

  3. Customization (Optional)

    • Modify or add rules in the Snakefile.
    • Update any scripts in scripts/ to customize the pipeline.

Common Snakemake Options

  • Dry Run

    snakemake --cores 1 --latency-wait 10

    Shows the planned jobs without executing them.

  • Force All Steps

    snakemake --cores 1 --latency-wait 10

    Re-runs every rule ignoring cached results.

  • Workflow DAG

    snakemake --dag | dot -Tpng > dag.png

    Exports a directed acyclic graph (DAG) of the workflow.


Docker Usage

We provide a ready-to-use Docker image to facilitate a reproducible environment. Below are instructions to pull or build the image, and run the pipeline inside a container.

Pull the Pre-built Image (Recommended)

docker pull mhtjsh/protein-dna-interface

Build the Image Yourself (Optional)

git clone https://github.com/mhtjsh/Protein_DNA_Interface_Generation.git
cd Protein_DNA_Interface_Generation
docker build -t mhtjsh/protein-dna-interface .

Running the Container

Basic Run Using Example Data

docker run --rm -it mhtjsh/protein-dna-interface

Mounting Input and Output Folders

To process your own input PDB files and retrieve outputs on your host system:

docker run --rm -it \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/input:/app/input \
  -v /home/mhtjsh/Protein_DNA_Interface_Generation/output:/app/output \
  mhtjsh/protein-dna-interface
  • Ensure your local input/ directory has .pdb files before running.
  • Results will be placed in output/.
  • Adjust the local path (/home/mhtjsh/Protein_DNA_Interface_Generation) to your actual directory if needed.

Testing

  1. Manual Testing

    • Place a test PDB file in input/.
    • Run Snakemake or the Docker container, verifying outputs in split_chain/, rsa/, and interface/.
  2. Automated Testing

    • Create minimal test data and a test rule in the Snakefile or a CI configuration (e.g., GitHub Actions).

Contributing

All contributions are welcome! To contribute:

  1. Fork this repository.
  2. Create a new feature branch.
  3. Submit a pull request with your changes.

License

This project is distributed under an open-source license (e.g., MIT). See LICENSE for details.

About

Toolkit for analyzing protein-DNA interfaces using computational methods. Includes Fortran, Python, and shell scripts for processing structural data, calculating interface properties, and visualizing results. Designed for bioinformatics and structural biology research focused on protein-DNA interactions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published