The github reporsitory for the under development tool, NanoCircle 2020. Useful for identifying the coordinates of both simple and chimeric circular molecules, sequenced using long-read sequencing.
porechop -i bc05.reads.fastq -b bc05.barcode_trim -t 8Use fastq-stats to obtain information regarding the sequences
minimap2 -t 6 -x map-ont -d GRCh37.mmi GRCh37.faminimap2 -t 8 -ax map-ont --secondary=no hg19.25chr.mmi read_file.fastq | samtools sort - > barcode.aln_hg19.bam
# -ax map-ont = Oxford Nanopore genomic reads
# --seconday=no With no reads mapped with SAM flag 0x100 (secondary flag).
# hg19.25chr.mmi minimizer index for the referencebedtools genomecov -bg -ibam barcode_hg19.bam | bedtools merge -d 1000 -i stdin | sort -V -k1,1 -k2,2n > barcode_1000_cov.bedSTEP 4 - Classify the soft-clipped read supporting Simple eccDNA and soft-clipped supporting Chimeric eccDNA
python NanoCircle_arg.py Classify -i barcode_hg19.bam -d temp_readsWhich will be saved in a folder temp_reads containing both simple and complex reads in .bam format.
samtools index temp_reads/Simple_reads.bam
samtools index temp_reads/Chimeric_reads.bampython NanoCircle_arg.py Simple -i barcode_1000_cov.bed -b temp_reads/Simple_reads.bam -q 60 -o barcode_Simple_circles.bedpython NanoCircle_arg.py Chimeric -i barcode_1000_cov.bed -b temp_reads/Chimeric_reads.bam -q 60 -o barcode_Chimeric_circles.bedThe output being a bed file with possible configurations of several chimeric eccDNA, since the identification extract reads originating from specific regions.
python NanoCircle_arg.py Merge -i barcode_Chimeric_circles.bed -o barcode_Merged_chimeric.bedThe output being a bed file with possible configurations of several chimeric eccDNA, since the identification extract reads originating from specific regions.
calculating jaccard index for each individual circle compared to the estimated region with coverage
bedtools intersect -wao -a barcode_Simple_circles_1000.bed -b barcode_1000_cov.bed | head -10 | awk -v OFS='\t' '{print $1,$2,$3,($4/((($3-$2)+($10-$9))-$4))}'To check if there might be a small region in between the coordinates without any coverage ? Or just use mean coverage
#Removing reads aligning to contamination sources, while still keeping the bam format.
samtools view -h BC10.aln_hg19.bam |grep -v '>N'| grep -v '>A' |samtools view -Sbo BC10.bam -
# No reads.
cat BC07.fastq | awk '{print $1}' | grep '@' | sort | uniq | wc –l
# No of mapped
samtools view -F 0x4 BC07/BC07.aln_hg19.bam | cut -f 1 | sort | uniq | wc -l