Open
Conversation
PhiDiM
added a commit
to PhiDiM/fastplong
that referenced
this pull request
Dec 7, 2025
update filter.cpp to base the averaging of the Q-scores on the underlying errors, based on OpenGene/fastp#495
bwlang
reviewed
Jan 18, 2026
Contributor
bwlang
left a comment
There was a problem hiding this comment.
this looks correct to me and produces reasonable results on my test data.
Contributor
|
results in my test data:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi, big fan of your work!
In this pull request, I rewrote the --average_qual method to accurately calculate the average quality of a read.
I was running .fastq files of DNA sequenced on our Nanopore through fastp (Nanopore says to use average read q-scores), and way more reads were passing the quality filter than I was used to. I looked into it, and fastp was averaging the q-scores, which are log values, and not taking the q-score out of log scale to p values before averaging. This results in way more reads passing the filter than there should be.
As an example:
To implement this, in the filter.cpp file, I changed the totalQual variable to a float. I then had the totalQual variable increment by the probability of error instead of the q-score. Then, in the 'else if' statement, I divided the final totalQual value of the read by the rlen, and calculated the resulting q-score to compare to the users input.
I complied the code and tested it on a simulated dataset, and the results were identical to the other nanopore quailty filtering packages I have on my machine.
Thanks again for fastp!!