You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This leads to a parse error claiming that the platform is invalid.
Looking at the code, it looks like it wants "Illumina" to be all upper or all lower. This feels unnecessarily restrictive. Can I suggest two possible fixes:
case fold the platform before matching
introduce an "other" alternative to the platform enum and use it when the platform is well-formed but unknown.
These are not mutually exclusive, of course.
I am happy to do a pull request if you think these are a good idea.
Tom.
The text was updated successfully, but these errors were encountered:
I thought I knew the spec, but obviously had missed that detail.
Still, it would maybe good to have a "tolerant" mode that accepts well-formed input even if it is not compliant, even if the output functions forbid the production of non-compliant output.
As you saw in #111, the SAM specification has an explicit list of valid read group platform values, which must be in uppercase. After the discussion in samtools/hts-specs#679, one possible resolution was to allow this field to be case-insensitive. This is drafted in samtools/hts-specs#684; however, it shows that the ruling changed from case-insensitive to remaining uppercase and allowing lowercase. This is what noodles currently, preemptively implements. As such, this still doesn't help your case when the platform value is mixed case (disallowed in ecc0037).
A tolerant (aka lenient or silent) mode is pragmatic, but it's detrimental to interoperability. It is an abuse of the file format and allows out-of-spec files to be produced.
I would encourage you to add your feedback to samtools/hts-specs#679. The spec maintainers have yet to formally address this issue. It is very much something that needs to be fixed at the specification level, not in each implementation.
I'm closing this in favor of tracking it in #111, but feel free to continue this discussion in either.
G'day.
Awesome library, thank you! I'm an experienced programmer, but new to Rust, so I apologise if the following is because of my rust-naivity.
The bam files from our pipeline contain read group headers like the following:
This leads to a parse error claiming that the platform is invalid.
Looking at the code, it looks like it wants "Illumina" to be all upper or all lower. This feels unnecessarily restrictive. Can I suggest two possible fixes:
These are not mutually exclusive, of course.
I am happy to do a pull request if you think these are a good idea.
Tom.
The text was updated successfully, but these errors were encountered: