Skip to content

Rcorrector dropping reads? (again?) #19

@angelaparodymerino

Description

@angelaparodymerino

Hi,

I am having this issue that was reported in a closed before. I am running Rcorrector in some RNAseq samples (first two by two) and some samples (not all of them, but the majority) seem to lose reads after the run. Run it a second time on some samples (this time one by one) and still the same problem with different number of

For example:
Samples before running Rcorrector:

~/data/RNAseq_PRJNA338760/FastQrawRNAseq$ grep -c "@HISEQ1" SRR4030253_1.fastq
18218121
~/data/RNAseq_PRJNA338760/FastQrawRNAseq$ grep -c "@HISEQ1" SRR4030253_2.fastq
18218121

After running Rcorrector (first time):

grep: SRR4030253_1.fastq: No such file or directory
mcv@bambi:~/data/RNAseqCorrected$ grep -c "@HISEQ1" SRR4030253_1.cor.fq
7839585
mcv@bambi:~/data/RNAseqCorrected$ grep -c "@HISEQ1" SRR4030253_2.cor.fq
7839611

After running Rcorrector (second time)

/angela/rcorrector$ perl run_rcorrector.pl -1 ~/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_1.fastq -2 ~/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_2.fastq -od RNAseqCorrected253
Put the kmers into bloom filter
/home/mcv/angela/rcorrector/jellyfish/bin/jellyfish bc -m 23 -s 100000000 -C -t 1 -o tmp_a798458599d74d3e1d510f550790024f.bc /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_1.fastq /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_2.fastq 
Count the kmers in the bloom filter
/home/mcv/angela/rcorrector/jellyfish/bin/jellyfish count -m 23 -s 100000 -C -t 1 --bc tmp_a798458599d74d3e1d510f550790024f.bc -o tmp_a798458599d74d3e1d510f550790024f.mer_counts /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_1.fastq /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_2.fastq 
Dump the kmers
/home/mcv/angela/rcorrector/jellyfish/bin/jellyfish dump -L 2 tmp_a798458599d74d3e1d510f550790024f.mer_counts > tmp_a798458599d74d3e1d510f550790024f.jf_dump
Error correction
/home/mcv/angela/rcorrector/rcorrector -od RNAseqCorrected253  -p /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_1.fastq /home/mcv/data/RNAseq_PRJNA338760/FastQrawRNAseq/SRR4030253_2.fastq -c tmp_a798458599d74d3e1d510f550790024f.jf_dump
Stored 83145603 kmers
Weak kmer threshold rate: 0.014117 (estimated from 0.950/1 of the chosen kmers)
Bad quality threshold is '#'
Processed 36436242 reads
	Corrected 41138010 bases.


~/angela/rcorrector/RNAseqCorrected253$ grep -c "@HISEQ1" SRR4030253_1.cor.fq
10861080
~/angela/rcorrector/RNAseqCorrected253$ grep -c "@HISEQ1" SRR4030253_2.cor.fq
10861067

I have no idea what is causing this. If you could help?

Thanks in advance,

'Angela

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions