Skip to content
/ mirDNN Public
forked from cyones/mirDNN

MicroRNA prediction with convolutional neural networks.

License

Notifications You must be signed in to change notification settings

lbugnon/mirDNN

 
 

Repository files navigation

MirDNN

MirDNN is a novel deep learning method specifically designed for pre-miRNA prediction in genome-wide data. The model is a convolutional deep residual neural network that can automatically learn suitable features from the raw data, without manual feature engineering. This model is capable of successfully learn the intrinsic structural characteristics of precursors of miRNAs, as well as their context in a sequence. The proposal has been tested with several genomes of animals and plants and compared with state-of-the-art algorithms.

MirDNN is described with detail in "High precision in microRNA prediction: a novel genome-wide approach based on convolutional deep residual networks," by C. Yones, J.Raad, L.A. Bugnon, D.H. Milone and G. Stegmayer (under review in a refereed journal).

Contact: Cristian Yones, sinc(i)

Web server

MirDNN can be used directly from this web server. This server provides two mirDNN pre-trained models (animals and plants) and can process both individual sequences or fasta files. When making predictions on individual sequences, the server generates a nucleotide importance graph.

Package installation

The latest version of the package can be downloaded from the GitHub repository. The exact version used in the paper is allocated in SourceForge.

To download from GitHub:

git clone --recurse-submodules https://github.com/cyones/mirDNN.git

After downloading the package (from GitHub or SourceForge), install the dependencies:

cd mirDNN
pip install -r requirements.txt

This install all the packages needed to run mirDNN. In order to train models or make predictions the secondary structure of the sequences has to be inferred. For this task, the ViennaRNA software should be use.

Usage

To make predictions or training new models, the first step is to predict the secondary structure of the sequences to process. This can be done with RNAfold. For example, given a fasta file named sequences.fa, run:

RNAfold --noPS --infile=sequences.fa --outfile=sequences.fold

(an example of folded sequence is provided)

Inference

Now that we have the .fold file, to make predictions with the provided pre-trained model for animals, simply run:

python3 mirdnn_eval.py -i sequences/test.fold -o predictions.csv -m models/animal.pmt -s 160 -d "cpu"

To calculate nucleotide importance values the command is:

python3 mirdnn_explain.py -i sequences/test.fold -o importance.csv -m models/animal.pmt -s 160 -d "cpu"

Training new models

To train new models, two .fold files are needed, one with negative examples (non pre-miRNA sequences) and other with positive examples (well-known pre-miRNAs).

Given these datasets, the training can be done with:

python3 mirdnn_fit.py -i negative_sequences.fold -i positive_sequences.fold -m out_model.pmt -l train.log -d "cuda:0" -s 160

NOTE: training a model is a very computing intensive task, therefore, it is recommended to use a GPU.

For more details about the training parameters, execute

python3 mirddn_fit.py -h

Reproduce experiments

To reproduce the experiments, R and BLAST must be installed. This can be done in Ubuntu (and derived distributions) with the followings commands:

sudo apt-get update
sudo apt-get install r-base ncbi-blast+

Training with the paper datasets is very computationally expensive, so it is necessary to have a GPU to do it in a reasonable time. All the experiments presented in the paper can be easily reproduced using the Makefile inside the folder experiments. For example, to generate the PRROC curve obtained in Caenorhabditis elegans, run:

cd experiments
make results/PRROC-cel.pdf

You will be asked to download the sequences files, and then all the necessary commands to train and test the model will be automatically executed.

About

MicroRNA prediction with convolutional neural networks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.2%
  • Makefile 11.6%
  • R 11.3%
  • Shell 0.9%