Skip to content

MEMO-group/ProD

Repository files navigation

ProD

Article:

Predictive design of sigma factor-specific promoters

Maarten Van Brempt, Jim Clauwaert, Friederike Mey, Michiel Stock, Jo Maertens, Willem Waegeman, Marjan De Mey

Journal

Cite this tool!

DOI

Introduction

The Promoter Designer (ProD) tool is a shallow neural network created by Van Brempt M. and Clauwaert. J et. al. for the prediction of promoter strength or transcription initiation frequency by variation of the spacer sequence. It is created to construct promoter libraries, further exploiting biological capabilities of the microorganisms that allow for the fine-tuning of genetic circuits. A neural network has been trained on hundreds of thousands of sequences that have been randomized in the 17nt spacer sequence. Therefore, generated promoters, ranging from no expression (strengh: 0) to high expression (strength: 10) all feature the same UP-region, binding boxes (-35, -10) and untranslated region (UTR).

PROMOTER MAP

[UP-region][-35-box][spacer][-10-box][ATATTC][UTR]

[GGTCTATGAGTGGTTGCTGGATAAC][TTTACG][NNNNNNNNNNNNNNNNN][TATAAT][ATATTC][AGGGAGAGCACAACGGTTTCCCTCTACAAATAATTTTGTTTAACTTT]

Usage

The tool can be used through Binder or on a local system.

  1. Binder Binder offers the hosting of GitHub on their servers

Binder

  1. Local installation

To run ProD on a local machine, simply clone the repository in your working directory and install the necessary python libraries.

For linux users:

git clone https://github.com/jdcla/FlowCyt.git
conda env create -f environment.yml

For Mac and Windows users, the packages and their versions will have to be manually installed. These are listed in the environment.yml file. These include

  1. Pytorch (1.4.0)
  2. Numpy (1.19.1)
  3. Pandas (1.1.0)
  4. Scipy (1.5.2)

No CUDA support is necessary (GPU accelarated processing), a PyTorch installation supporting CUDA is therefore not installed through environment.yml. See www.pytorch.com for more information.

Local installations can run the python script ProD.py

usage: ProD.py [-h] [--output_path OUTPUT_PATH] [--lib] [--lib_size LIB_SIZE]
               [--classes [CLASSES [CLASSES ...]]] [--cuda]
               input_data

Promoter Designer (ProD) tool

positional arguments:
  input_data             location of the text file containing spacer sequences

optional arguments:
  -h, --help            show this help message and exit
  --output_path OUTPUT_PATH, -o OUTPUT_PATH
                        location of the text file containing spacer sequences
  --lib                 create library from blueprint
  --lib_size LIB_SIZE, -ls LIB_SIZE
                        size of each class in library (min:1, max:64)
  --strengths [STRENGTHS [STRENGTHS ...]],
                        promoter strengths included in the library (min:0, max:10)
  --cuda                use CUDA. Requires PyTorch installation supporting
                        CUDA!

Constructing a Library

To create a custom promoter library, a single input blueprint is given that functions as the source from which spacer sequences are evaluated. The tool will run through the following steps:

  1. Create all possible sequences from the degenerate input sequence
  2. Determine the promoter strengths, retain all spacer sequences for requested promoter strengths
  3. Sample promoters to construct library.
  4. Construct degenerate sequence (library blueprint) from all sequences. For each blueprint, the fraction of sequences classified to each category of strength is given.

If the amount of sequences possible from the input sequence exceeds 500,000, spacers will be sampled (100,000) instead and no library blueprint is created. To attain feasible processing times, a minimal amount of user guidance in the construction of the library blueprint is required.

NOTE: Promoter strength is divided in 11 ordinal classes ranging from 0 to 10. Overlap between neighbouring class strengths is expected. Therefore, when constructing a library it can be beneficial to group classes together. Specifically, we recommend the following interpretation of four sets of input strengths.

  • zero to low expression: strengths = 0 1 2
  • low to medium expression: strengths = 3 4 5
  • medium to high expression: strengths = 6 7 8
  • high to very high expression: strengths = 9 10

EXAMPLE

python ProD.py NNNCGGGNCCNGGGNNN --lib --strengths 8 9 10

Evaluate Custom Spacers

It is possible to evaluate custom sequences. The input can be given as a list or the path to a fasta file.

EXAMPLE

python ProD.py NNNCGGGNCCNGGGAAA AAACCCGGGTTTTAAAC

python ProD.py ex_seqs.fa

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published