-
Notifications
You must be signed in to change notification settings - Fork 0
08 STAR alignment
The alignment of the RNA-seq reads was done using STAR. STAR aligns RNA-seq to a reference, the problem with doing this is that that the reference genome contains introns that are no longer present in the RNA and therefore the reads may need to be split to align properly to the reference. STAR is able to detect these cases and do the splitting. STAR can also to some degree align RNA to the reference event hough some parts of an adaptor from illumina is still present in the RNA or if there is a mismatch to the reference.
The percentage varied a bit between the runs and samples: SRR6040092: 93.37% SRR6040092: 90.12% SRR6040094: 93.68% SRR6040095: 94.21% SRR6040096: 92.71% SRR6040097: 91.68% SRR6156066: 91.21% SRR6156067: 90.29% SRR6156069: 90.39%
The reason why not 100% of the reads are matched can be because the genome is not complete and therefore not all the reads can be mapped. Some reads can also be too short to be able to match.
What potential issues can cause mRNA reads not to map properly to genes in the chromosome? Do you expect this to differ between prokaryotic and eukaryotic projects?
There can be multiple splice variants of a gene and therefore the mRNA that comes from the same gene can map differently and therefore the mRNA can match improperly. This differ between procaryotic and eucaryotic projects because a lot of procaryotic organisms don't have introns while eucaryotic organisms do.
No percentage of this can be found since we don't know at this point what is a gene and what is something else. The reads can map to sequences that are to form the ribosome and tRNA and other RNA molecules.
How many reads do not map to genes? What does that mean? How does that relate to the type of sequencing data you are mapping?
This can not be answered due to that we don't know what is a gene and what isn't.