Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Cleverly' align sets of redundant sequences #64

Closed
AndreaGuarracino opened this issue Jan 25, 2023 · 2 comments
Closed

'Cleverly' align sets of redundant sequences #64

AndreaGuarracino opened this issue Jan 25, 2023 · 2 comments

Comments

@AndreaGuarracino
Copy link

In our applications, it can happen that SPOA is given sets that have many duplicate sequences. For example, this multi-FASTA

smoothxg_into_spoa_pad311_621639_in_1884956ms.zip

has 9280 sequences, of which 2416 are unique.

Is there a way to tweak SPOA to only work on the 2416 sequences, but to weigh them properly with respect to their frequencies in the non-deduplicated set? I smell it could be done, at least theoretically. We would need this feature when using SPOA as a submodule in other projects. The aim is to avoid redundant work while keeping consensus sequences that make sense.

@rvaser
Copy link
Owner

rvaser commented Jan 27, 2023

Hi Andrea,
finding which sequences are identical sounds to me as a preprocessing step which is not directly tied to the SPOA library. If you know which are duplicates (either prior knowledge or determined with an algorithm), you can add them only once to the POA graph with increased weights (either coverage or per base sum of quality values).

Best regards,
Robert

@AndreaGuarracino
Copy link
Author

Ah, I hadn't noticed that I can already set the weight of the sequence every time I add a new alignment (https://github.com/rvaser/spoa/blob/master/include/spoa/graph.hpp#L137). So the only thing that needed to be "cleverized" was me! Thank you for your prompt reply! Keep pushing in POA (random ping #31).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants