Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 940 Bytes

README.md

File metadata and controls

33 lines (26 loc) · 940 Bytes

GFF-GTF-analysis

General Feature Format files consist of one line per feature, each containing 9 columns of data. For a more detailed explanation about this file format and its different columns please refer to https://www.ensembl.org/info/website/upload/gff.html.

Importing the script and initializing the file

import GeneralFormat as gf 
file_path = "hg38_5k.gtf"
gtf = gf.GeneralFormat(file_path)

Getting the number of non-redondant transcripts in the file

gtf.nb_nr_tx() 

Getting the number of exons per transcript

gtf.ex_per_tx() 

Sending back the length (in bp) of the circular dna for each transcript (exons)

gtf.cdna_per_tx() 

Getting the genome coverage (exons+introns) for every transcript in the file

gtf.tx_coverage() 
  • In each of the cases above, the output will be a dictionary that maps from the transcript to the wished output.