Skip to content
View roc1293's full-sized avatar

Block or report roc1293

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
roc1293/README.md

Extract intron feature gff3 from gene_exon gff3 file

There are several ways to extract intron feature gff3 from gene_exon gff3 file. We can use GBrowse databases to dump intron based gff3 file as a first option.

Option 1:

Load the gff3 file into MySQL:
perl bp_bulk_load.pl -u [uname]-p [pass] -d [gbrowse_database] [input.gff3/input.fasta]
Extract intron feature gff3:
perl make_intron_feature.pl -u [uname]-p [pass] -db [gbrowse_database] -o [output.gff3]

Here are the final results.

Option 2:

This is an alternative solution without using GBrowse and MySQL. First we need to download and install the latest version of misopy and gffutils. Then use the following code.
python extract_intron_gff3_from_gff3.py [input.gff3] [output.gff3]

Finally we need to filter and sort the output gff3 file

awk '/intron/{print}' output.gff3 | sort -k 1,1 -k4,2n   > processed_intron.gff3

Option 3:

If you don't like to type commands, you can use the PlantGenIE Galaxy extract intron feature tool.
Before:

Chr01	phytozome8_0	gene	2906	6646	.	-	.	ID=Potri.001G000200;Name=Potri.001G000200  
Chr01	phytozome8_0	mRNA	2906	6646	.	-	.	ID=PAC:27045395;Name=Potri.001G000200.1;  
Chr01	phytozome8_0	exon	6501	6646	.	-	.	ID=PAC:27045395.exon.1;Parent=PAC:27045395;    
Chr01	phytozome8_0	CDS	6501	6644	.	-	0	ID=PAC:27045395.CDS.1;Parent=PAC:27045395;   
Chr01	phytozome8_0	five_prime_UTR	6645	6646	.	-	.	ID=PAC:27045395.five_prime_UTR.1; 
Chr01	phytozome8_0	exon	3506	3928	.	-	.	ID=PAC:27045395.exon.2;Parent=PAC:27045395;  
Chr01	phytozome8_0	CDS	3506	3928	.	-	0	ID=PAC:27045395.CDS.2;Parent=PAC:27045395;    
Chr01	phytozome8_0	exon	2906	3475	.	-	.	ID=PAC:27045395.exon.3;Parent=PAC:27045395;  

After:

Chr01	phytozome8_0	intron	3476	3505	.	.	.	ID=Potri.001G000200;Parent=PAC:27045395  
Chr01	phytozome8_0	intron	3929	6500	.	.	.	ID=Potri.001G000200;Parent=PAC:27045395  

Final results similar to this

Extract intron feature sequence file from gene_intron gff3 and fasta file

Here we use the output from above steps(processed_intron/output.gff3). perl exttract_seq_from_gff3.pl -d genome.fa - gene_intron.gff3 > output_intron.fa

Test results here.

Pinned Loading

  1. Kherson Kherson Public

    🇺🇦️🟩️ Data set for the Ukraine History Simulator project that contains data for Kherson Oblast in Ukraine

    Python

  2. roc1293 roc1293 Public

    Perl

  3. simpleproject simpleproject Public

    f***ing simple distributed file system / Old toy project, do not use :)

    Makefile

  4. terminal terminal Public

    A lightweight autonomous AI coding agent with terminal-based UI, inspired by Codex and Cline.

    TypeScript