-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hello Dr. Vollger,
I am writing a rust package to deal with modification data in a different context than fiberseq. Since fibertools parses Mm and Ml tags in modBAM data and is open source, I am thinking of importing the function my_mm_ml_parser and the like from src/utils/basemods.rs in my package.
This function doesn't cover all Mm/Ml tag formats, so I was wondering if you have plans to make it more general; please see the two examples below. I mean I could take a crack at it with some guidance and do a pull request here, or just take your code with an acknowledgement and put it in my package and try to modify it there. My other options are (1) to use rust_htslib which doesn't do an out of the box reference to query mapping of modification data, which you do as far as I can tell, or (2) to use modkit from ONT which is not under an MIT license.
(1) format A: MM:Z:C+mh,5,12; ML:B:C,204,26,89,130 is equivalent to format B: ‘MM:Z:C+m,5,12;C+h,5,12; ML:B:C,204,89,26,130’ As far as I can tell, you support the B format but not the A. I've not seen any example of software using the A format, but maybe in the future some technology calling two types of methylation on the C will use this.
(2) ‘MM:Z:C+m,5,12;C+h,5,12; ML:B:C,204,89,26,130’, this tag technically means any cytosine on the molecule not mentioned in the tag is to be regarded as unmodified. But your code treats any cytosine not mentioned in the tag as "missing" rather than unmodified if I am not mistaken.
Regards
Sathish