Skip to content

Ashargin/DivideFold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DivideFold

This repository contains the DivideFold model for predicting the secondary structure of long RNAs.
The goal of this method is to recursively partition the input sequence into smaller fragments and use an existing structure prediction tool on the fragments. Then, the predicted structures are reassembled to form the global structure prediction for the input sequence.
DivideFold aims to partition the sequence in a way that the structure is conserved as much as possible.

Requirements

  • python>=3.9
  • keras>=3.2.1
  • Either torch>=2.5.0 or tensorflow>=2.16.1 as the Keras backend
  • Other scientific packages: numpy, scipy, pandas

Installation

Clone the repository and install with your preferred backend (torch is recommended for compatibility with KnotFold):

git clone https://github.com/Ashargin/DivideFold
cd DivideFold
python3 -m venv myenv       # Optional but recommended
source myenv/bin/activate
pip install --upgrade pip
pip install -e .[torch]     # or .[tensorflow]

We also recommend that you install KnotFold since it is the structure prediction function that we use by default.

We provide wrappers for KnotFold, IPknot, pKiss, ProbKnot, RNAfold, LinearFold, MXfold2 and UFold.
In order to use one of these structure prediction tools, it should be installed on your system, in the same parent folder as DivideFold.
For example, KnotFold should be installed at ../KnotFold.

The structure prediction tools can also be installed anywhere else on your system.
In that case, the path must be specified with the dirpath argument when using KnotFold, ProbKnot, LinearFold, MXfold2 or UFold, if the tool is not already installed in the same parent folder as DivideFold.
The paths for IPknot, pKiss and RNAfold do not matter and do not need to be specified.

Usage

Secondary structure prediction

You can predict a sequence's secondary structure using the prediction function:

from dividefold.predict import dividefold_predict
import numpy as np
sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
prediction = dividefold_predict(sequence)

By default, the structure prediction tool to be applied after partition is KnotFold.

Specifying the structure prediction tool to be applied on the fragments

We also provide wrappers for IPknot, pKiss, ProbKnot, RNAfold, LinearFold, MXfold2 and UFold.
If the corresponding tool is installed on your system, you can use it as the structure prediction function for DivideFold:

from dividefold.predict import dividefold_predict, knotfold_predict, ipknot_predict, pkiss_predict, probknot_predict, rnafold_predict, linearfold_predict, mxfold2_predict, ufold_predict
import numpy as np
sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
prediction = dividefold_predict(sequence, predict_fnc=rnafold_predict)  # if you want to use RNAfold as the structure prediction function

Using a custom structure prediction function

It is also possible to use any custom structure prediction function on the fragments after partition:

from dividefold.predict import dividefold_predict
import numpy as np

def my_structure_prediction_function(seq):  # example structure prediction function
    n = len(seq)
    return "(" * (n // 2) + "." * (n % 2) + ")" * (n // 2)

sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
prediction = dividefold_predict(sequence, predict_fnc=my_structure_prediction_function)

Specifying the maximum fragment length (partition depth)

An important parameter is the maximum partition length. A lower value will lead to the sequence being partitioned more deeply into smaller fragments.
The fragments can be up to 1000 nc long by default, but if the structure prediction tool struggles to accurately process fragments of this size, it could be better to yield smaller fragments.
This can be specified with the max_fragment_length argument:

from dividefold.predict import dividefold_predict
import numpy as np
sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
prediction = dividefold_predict(sequence, max_fragment_length=200)  # if you want fragments to be smaller than 200 nc

Obtaining fragments coordinates

To obtain the fragments resulting from DivideFold's partition, use return_fragments=True:

from dividefold.predict import dividefold_predict
import numpy as np
sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
prediction, fragments = dividefold_predict(sequence, return_fragments=True)

If you're only interested in the fragments, and not in predicting the secondary structure, you can use return_structure=False:

from dividefold.predict import dividefold_predict
import numpy as np
sequence = "".join(np.random.choice(["A", "U", "C", "G"], size=2000))  # example sequence
fragments = dividefold_predict(sequence, return_fragments=True, return_structure=False)

In this case, only the partition will be computed, and no structure prediction tool needs to be installed as none will be used.

Web server

We provide a web server at https://evryrna.ibisc.univ-evry.fr/evryrna/dividefold/webserver.

Data availability

All data used in our training and experiments can be found in data/data_structures/.

References

  • Omnes, L., Angel, E., Bartet, P., and Tahi, F.: A Divide-and-Conquer Approach Based on Deep Learning for Long RNA Secondary Structure Prediction: Focus on Pseudoknots Identification. PLOS ONE 20.4 (2025). https://doi.org/10.1371/journal.pone.0314837

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages