Skip to content

Clarify "very short reads" in helptext of clip_readlength #887

Closed

Description

The current helptext reads:

Defines the minimum read length that is required for reads after merging to be considered for downstream analysis after read merging. Default is 30.
Note that performing read length filtering at this step is not reliable for correct endogenous DNA calculation, when you have a large percentage of very short reads in your library - such as retrieved in single-stranded library protocols. When you have very few reads passing this length filter, it will artificially inflate your endogenous DNA by creating a very small denominator. In these cases it is recommended to set this to 0, and use --bam_filter_minreadlength instead, to filter out 'un-usable' short reads after mapping.

We should clarify what "very short reads in your library" means. To my understanding that would be a length distribution peak below 20bp. The added computational work to map all sequenced fragments is considerable, and this approach can be avoided when the length distribution peak is still within 20/25bp. In such cases I think users could lower the clip_readlength without actually setting it to 0 and avoid all the extra computation while still getting an Endo % that is comparable to that given with default settings.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions