Skip to content

weiyutang1010/ncrna_design

Repository files navigation

SamplingDesign: RNA Design via Continuous Optimization with Coupled Variables and Monte-Carlo Sampling

This repository contains the source code for the SamplingDesign project. The Python code for generating the figures in the paper is available in ./figures/

Wei Yu Tang, Ning Dai, Tianshuo Zhou, David H. Mathews, and Liang Huang*

* corresponding author

For questions, please contact the corresponding author at liang.huang.sh@gmail.com.

To Compile

Compiler version: g++ (Spack GCC) 8.3.0

make

Python Dependencies

python (3.8.20), numpy (1.24.4), matplotlib (3.7.5), viennarna (2.6.4)

pip install -r requirements.txt

Example Script

Run SamplingDesign for the shortest five structures (up to 30 nucleotides) in Eterna100 with 200 steps (takes ~30 seconds).

./run.sh example ./data/example.txt

The results will be saved in ./results/example/. The script then parses the result file to generate learning curves in ./graphs/example/ and output the best solution (based on each metric) into ./analysis/example/.

Run all structures in a dataset

Reproduce the results in the paper (run both uniform and $\epsilon$-targeted initializations). Note: On our server with 64 cores, the Eterna100 dataset took approximately 10 days to complete.

Command

./run_all.sh ./data/eterna100_v1.txt
python merge.py eterna100_v1 eterna100_uniform eterna100_targeted # summarize results

Replace ./data/eterna100.txt with

  • ./data/eterna100_v2.txt to run all the modified 19 structures in Eterna100-v2.
  • ./data/rfam27.txt to run all the Rfam-Taneda-27 structures.
  • ./data/rnasolo.txt to run all the RNAsolo-764 structures.

To run SamplingDesign

Command

echo "[target structure]" | ./main [args] > ./result.txt

Example

echo "(((...)))" | ./main --steps 50 --verbose > ./result.txt

Arguments

Objective functions: "prob" - Boltzmann probability, "ned" - normalized ensemble defect, "dist" - structural distance, "ddg" - free energy gap. (default: "prob")

--obj [prob/ned/dist/ddg]

Initializations: uniform or targeted (default: targeted)

--init [uniform/targeted]

eps: For $\epsilon$-targeted initialization. To use targeted initialization, set $\epsilon$ = 1.0. (default: 0.75)

--eps [a value between 0 and 1]

projection: use the direct parameterization (projected gradient descent) instead of the softmax parameterization. (default: false)

--projection

no_adam: turn off adam optimizer (default: false)

--no_adam

beta_1, beta_2: the first and second moment decay rates for adam optimizer (default: 0.9, 0.999)

--beta_1 [value] --beta_2 [value]

nesterov: use Nesterov's accelerated gradient descent for projected gradient descent (default: false)

--nesterov

initial_lr: set initial learning rate (default: 0.01)

--initial_lr [value]

num_steps: set max number of steps (default: 2000)

--num_steps [value]

no_early_stop: turn off early stopping (default: false)

--no_early_stop

k_ma: k moving average parameter used for early stopping (default: 50)

--k_ma [value]

beamsize: set beamsize and sharpturn (default: 250)

--beamsize [value]

is_lazy: use lazyOutside when evaluating NED (default: False)

--is_lazy

sample_size: number of samples used per step (default: 2500)

--sample_size [value] 

best_k: print out best k samples (in terms of objective function) at each step (default: 1)

--best_k [value]

verbose: print out the logit, the distribution and the gradient at each step (default: False)

--verbose

num_threads: max number of threads used by openMP, if 0 then use the default number (default: 0)

--num_threads [value]

boxplot: print out the objective of all samples at each step (for generating the boxplot in the learning curve) (default: False)

--boxplot

analysis.py

analysis.py performs two main tasks:

  1. Re-evaluates the best solution at each step using the ViennaRNA package (version 2.6.4), and saves the best solution for each metric to ./analysis/{folder}/.
  2. Generates a learning curve and saves it to ./graphs/{folder}/.

Example Command

python analysis.py --folder "example" --file "1.txt"

Input: Parses the results from ./results/{folder}/{file}. Output:

  • Best solutions are saved to ./analysis/{folder}/{file}.txt

  • The learning curve is saved as ./graphs/{folder}/{file}.pdf

Arguments

folder, file: parse the file at ./results/{folder}/{file}

--folder "example" --file "8.txt"

max_workers: set the max number of workers (threads) (default: None)

--max_workers [value]

merge.py

merge.py combines multiple results folder from ./analysis/ and summarize them.

python merge.py <data file> <folder 1> [folder 2] ... [folder n]

Note that merge.py takes only the file name as arguments. E.g.,

python merge.py eterna100 eterna100_uniform eterna100_targeted

after running ./run_all.sh ./data/eterna100.txt

About

The SamplingDesign source code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages