Clarify "very short reads" in helptext of `clip_readlength`

The current helptext reads:
>Defines the minimum read length that is required for reads after merging to be considered for downstream analysis after read merging. Default is 30.
>Note that performing read length filtering at this step is not reliable for correct endogenous DNA calculation, when you have a large percentage of very short reads in your library - such as retrieved in single-stranded library protocols. When you have very few reads passing this length filter, it will artificially inflate your endogenous DNA by creating a very small denominator. In these cases it is recommended to set this to 0, and use `--bam_filter_minreadlength` instead, to filter out 'un-usable' short reads after mapping.

We should clarify what "very short reads in your library" means. To my understanding that would be a length distribution peak below 20bp. The added computational work to map all sequenced fragments is considerable, and this approach can be avoided when the length distribution peak is still within 20/25bp. In such cases I think users could lower the `clip_readlength` without actually setting it to 0 and avoid all the extra computation while still getting an Endo % that is comparable to that given with default settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify "very short reads" in helptext of `clip_readlength` #887

TCLamnidis
openedon May 24, 2022

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify "very short reads" in helptext of clip_readlength #887

Description

TCLamnidisopenedon May 24, 2022

Activity

Metadata