SamplingDesign: RNA Design via Continuous Optimization with Coupled Variables and Monte-Carlo Sampling
This repository contains the source code for the SamplingDesign project. The Python code for generating the figures in the paper is available in ./figures/
Wei Yu Tang, Ning Dai, Tianshuo Zhou, David H. Mathews, and Liang Huang*
* corresponding author
For questions, please contact the corresponding author at liang.huang.sh@gmail.com.
Compiler version: g++ (Spack GCC) 8.3.0
make
python
(3.8.20), numpy
(1.24.4), matplotlib
(3.7.5), viennarna
(2.6.4)
pip install -r requirements.txt
Run SamplingDesign for the shortest five structures (up to 30 nucleotides) in Eterna100 with 200 steps (takes ~30 seconds).
./run.sh example ./data/example.txt
The results will be saved in ./results/example/
. The script then parses the result file to generate learning curves in ./graphs/example/
and output the best solution (based on each metric) into ./analysis/example/
.
Reproduce the results in the paper (run both uniform and
./run_all.sh ./data/eterna100_v1.txt
python merge.py eterna100_v1 eterna100_uniform eterna100_targeted # summarize results
Replace ./data/eterna100.txt
with
./data/eterna100_v2.txt
to run all the modified 19 structures in Eterna100-v2../data/rfam27.txt
to run all the Rfam-Taneda-27 structures../data/rnasolo.txt
to run all the RNAsolo-764 structures.
echo "[target structure]" | ./main [args] > ./result.txt
echo "(((...)))" | ./main --steps 50 --verbose > ./result.txt
Objective functions: "prob" - Boltzmann probability, "ned" - normalized ensemble defect, "dist" - structural distance, "ddg" - free energy gap. (default: "prob")
--obj [prob/ned/dist/ddg]
Initializations: uniform or targeted (default: targeted)
--init [uniform/targeted]
eps: For
--eps [a value between 0 and 1]
projection: use the direct parameterization (projected gradient descent) instead of the softmax parameterization. (default: false)
--projection
no_adam: turn off adam optimizer (default: false)
--no_adam
beta_1, beta_2: the first and second moment decay rates for adam optimizer (default: 0.9, 0.999)
--beta_1 [value] --beta_2 [value]
nesterov: use Nesterov's accelerated gradient descent for projected gradient descent (default: false)
--nesterov
initial_lr: set initial learning rate (default: 0.01)
--initial_lr [value]
num_steps: set max number of steps (default: 2000)
--num_steps [value]
no_early_stop: turn off early stopping (default: false)
--no_early_stop
k_ma: k moving average parameter used for early stopping (default: 50)
--k_ma [value]
beamsize: set beamsize and sharpturn (default: 250)
--beamsize [value]
is_lazy: use lazyOutside when evaluating NED (default: False)
--is_lazy
sample_size: number of samples used per step (default: 2500)
--sample_size [value]
best_k: print out best k samples (in terms of objective function) at each step (default: 1)
--best_k [value]
verbose: print out the logit, the distribution and the gradient at each step (default: False)
--verbose
num_threads: max number of threads used by openMP, if 0 then use the default number (default: 0)
--num_threads [value]
boxplot: print out the objective of all samples at each step (for generating the boxplot in the learning curve) (default: False)
--boxplot
analysis.py
performs two main tasks:
- Re-evaluates the best solution at each step using the ViennaRNA package (version 2.6.4), and saves the best solution for each metric to ./analysis/{folder}/.
- Generates a learning curve and saves it to ./graphs/{folder}/.
python analysis.py --folder "example" --file "1.txt"
Input: Parses the results from ./results/{folder}/{file}. Output:
-
Best solutions are saved to ./analysis/{folder}/{file}.txt
-
The learning curve is saved as ./graphs/{folder}/{file}.pdf
folder, file: parse the file at ./results/{folder}/{file}
--folder "example" --file "8.txt"
max_workers: set the max number of workers (threads) (default: None)
--max_workers [value]
merge.py
combines multiple results folder from ./analysis/
and summarize them.
python merge.py <data file> <folder 1> [folder 2] ... [folder n]
Note that merge.py
takes only the file name as arguments. E.g.,
python merge.py eterna100 eterna100_uniform eterna100_targeted
after running ./run_all.sh ./data/eterna100.txt