-
Notifications
You must be signed in to change notification settings - Fork 6
Splicing #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Splicing #38
Conversation
…I since these are generally never needed outside that context. feat: fully functional zero-copy splicing mechanics. fix: bug in rev and rev comp causing garbage output.
Ok got it, that makes sense. Yeah I guess documentation is the best way to go for now. Thanks for the thorough explanation! |
|
Hey Brian, this looks good! Thanks to @BradBalderson I just caught and fixed a bug in the reverse complement function. I think it would make sense that ~half of the sequences are way off with that bug. Can you try again with the latest commits? |
Without VCF normalisationUsing the VCF directly to create the GVL db, without any normalisation with Nucleotide-levelMUCH better seq sim for nuc level. Amino acid-levelUnfortunately, still lots of stop codons: And actually the AA seq sim is a bit lower than before: With VCF normalisation[placeholder] |
for more information, see https://pre-commit.ci







Closes #24.
docs: fix version format to be vX.Y.Z
feat: initial prototype for splicing.
Splice regions together
Allow different definition of an overlapping variant to be fully exonic and not overlapping with splice sites a la Haplosaurus.
Update Dataset API (or maybe a new class) to reflect different shape and definition of a row.
Tests against Haplosaurus on 1kGP chr22 @bschilder
Performance issues, possibly from slow RC