fix average q-score calculation by DrinnanSante · Pull Request #495 · OpenGene/fastp

DrinnanSante · 2023-05-31T18:04:41Z

Hi, big fan of your work!

In this pull request, I rewrote the --average_qual method to accurately calculate the average quality of a read.

I was running .fastq files of DNA sequenced on our Nanopore through fastp (Nanopore says to use average read q-scores), and way more reads were passing the quality filter than I was used to. I looked into it, and fastp was averaging the q-scores, which are log values, and not taking the q-score out of log scale to p values before averaging. This results in way more reads passing the filter than there should be.

As an example:

     A base with a q-score of 10 and a second base with a q-score of 20, if  
     averaged, would have an average q-score of 15.

     However, if you average the probability of errors: 

     A q-score of 10 is a probability of error of 0.1
     A q-score of 20 is a probability of error of 0.01
     Averaging the probability of error:   0.1 + 0.01 = 0.11  | 0.11 / 2 = 0.055

     The q-score for a probability of error of 0.055 is ~12.5. 
     This number accurately reflects the average amount of error present in the read.

To implement this, in the filter.cpp file, I changed the totalQual variable to a float. I then had the totalQual variable increment by the probability of error instead of the q-score. Then, in the 'else if' statement, I divided the final totalQual value of the read by the rlen, and calculated the resulting q-score to compare to the users input.

I complied the code and tested it on a simulated dataset, and the results were identical to the other nanopore quailty filtering packages I have on my machine.

Thanks again for fastp!!

update filter.cpp to base the averaging of the Q-scores on the underlying errors, based on OpenGene/fastp#495

bwlang

this looks correct to me and produces reasonable results on my test data.

bwlang · 2026-01-18T22:34:50Z

results in my test data:

Threshold	Master	this PR
Avg Q ≥ 15	Passed: 9,671 Failed: 0 (0.0%)	Passed: 9,296 Failed: 375 (3.9%)
Avg Q ≥ 20	Passed: 9,671 Failed: 0 (0.0%)	Passed: 8,694 Failed: 977 (10.1%)
Avg Q ≥ 30	Passed: 9,027 Failed: 644 (6.7%)	Passed: 6,934 Failed: 2737 (28.3%)

fix average q-score calculation

25cd40c

PhiDiM added a commit to PhiDiM/fastplong that referenced this pull request Dec 7, 2025

Update filter.cpp

0efe4ad

update filter.cpp to base the averaging of the Q-scores on the underlying errors, based on OpenGene/fastp#495

bwlang reviewed Jan 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix average q-score calculation#495

fix average q-score calculation#495
DrinnanSante wants to merge 1 commit intoOpenGene:masterfrom
DrinnanSante:fix_average_q-score_calculation

DrinnanSante commented May 31, 2023

Uh oh!

bwlang left a comment

Uh oh!

bwlang commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DrinnanSante commented May 31, 2023

Uh oh!

bwlang left a comment

Choose a reason for hiding this comment

Uh oh!

bwlang commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants