-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Description
Hi,
I am using angsd to produce fasta sequences, these are automatically gzipped fasta files.
If I open the fasta file with R-Biostrings the sequence compositon looks like this:
dna<-readDNAStringSet("WSBg.asm5.fa.gz")
> alphabetFrequency(dna[1])
A C G T M R W S Y K V H D B N - + .
[1,] 53974765 37689595 37636633 53870814 0 0 0 0 0 0 0 0 0 0 11982472 0 0 0
However, for the same fasta.gz file the sequence composition with pyfastx looks like this:
fa=pyfastx.Fasta("WSBg.asm5.fa.gz")
s1=fa['chr1']
s1.composition
{'\x00': 162258284,
'A': 8774131,
'C': 5629514,
'G': 5628512,
'N': 4131093,
'T': 8732745}
Could you please indicate what the '\x00' would mean?
Can it be that pyfastx can not correctly index read these gzipped files?
Thank you in anticipation
Best regards
Kristian
Metadata
Metadata
Assignees
Labels
No labels