-
Notifications
You must be signed in to change notification settings - Fork 49
/
Copy pathTUTORIAL
87 lines (37 loc) · 3.33 KB
/
TUTORIAL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
To run the tutorial please go to the tutorial subfolder.
Introduction:
The user wishes to analyze deep sequencing data mapping to a ~6 kb region on C. elegans chromosome II for known and novel miRNA genes.
---------------------------------------------------------------------------------------------------------------------------------------------------
Preliminary files:
cel_cluster.fa: a fasta file with the reference genome (this file is in fact a ~6 kb region of the C. elegans chromosome II).
mature_ref_this_species.fa: a fasta file with the reference miRBase mature miRNAs for the species (C. elegans miRBase v.14 mature miRNAs)
mature_ref_other_species.fa: a fasta file with the reference miRBase mature miRNAs for related species (C. briggsae and D. melanogaster miRBase v.14 mature miRNAs)
precursors_ref_this_species.fa: a fasta file with the reference miRBase precursor miRNAs for the species (C. elegans miRBase v.14 precursor miRNAs)
reads.fa: a fasta file with the deep sequencing reads.
--------------------------------------------------------------------------------------------------------------------------------------------------
Analysis:
Step 1:
build an index of the genome (in this case the ~6 kb region):
bowtie-build cel_cluster.fa cel_cluster
Step 2:
process reads and map them to the genome.
The -c option designates that the input file is a fasta file (for other input formats, see the README file). The -j options removes entries with
non-canonical letters (letters other than a,c,g,t,u,n,A,C,G,T,U,N). The -k option clips adapters. The -l option discards reads shorter than 18 nts.
The -m option collapses the reads. The -p option maps the processed reads against the previously indexed genome (cel_cluster). The -s option
designates the name of the output file of processed reads and the -t option designates the name of the output file of the genome mappings. Last,
-v gives verbose output to the screen.
mapper.pl reads.fa -c -j -k TCGTATGCCGTCTTCTGCTTGT -l 18 -m -p cel_cluster -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v
Step 3:
fast quantitation of reads mapping to known miRBase precursors.
(This step is not required for identification of known and novel miRNAs in the deep sequencing data when using miRDeep2.pl.)
quantifier.pl -p precursors_ref_this_species.fa -m mature_ref_this_species.fa -r reads_collapsed.fa -t cel -y 16_19
The miRNA_expressed.csv gives the read counts of the reference miRNAs in the data in tabular format. The results can also be browsed by opening
expression_16_19.html with an internet browser.
Step 4:
identification of known and novel miRNAs in the deep sequencing data:
miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_species.fa -t C.elegans 2> report.log
Step 5:
browse the results.
open the results.html using an internet browser. Notice that cel-miR-37 is predicted twice, since both potential precursors excised from this locus
can fold into hairpins. However, the annotated hairpin scores much higher than the non-annotated one (miRDeep2 score 6.1e+4 vs. -0.2).
--------------------------------------------------------------------------------------------------------------------------------------------