You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to tweak SPOA to only work on the 2416 sequences, but to weigh them properly with respect to their frequencies in the non-deduplicated set? I smell it could be done, at least theoretically. We would need this feature when using SPOA as a submodule in other projects. The aim is to avoid redundant work while keeping consensus sequences that make sense.
The text was updated successfully, but these errors were encountered:
Hi Andrea,
finding which sequences are identical sounds to me as a preprocessing step which is not directly tied to the SPOA library. If you know which are duplicates (either prior knowledge or determined with an algorithm), you can add them only once to the POA graph with increased weights (either coverage or per base sum of quality values).
Ah, I hadn't noticed that I can already set the weight of the sequence every time I add a new alignment (https://github.com/rvaser/spoa/blob/master/include/spoa/graph.hpp#L137). So the only thing that needed to be "cleverized" was me! Thank you for your prompt reply! Keep pushing in POA (random ping #31).
In our applications, it can happen that SPOA is given sets that have many duplicate sequences. For example, this multi-FASTA
smoothxg_into_spoa_pad311_621639_in_1884956ms.zip
has 9280 sequences, of which 2416 are unique.
Is there a way to tweak SPOA to only work on the 2416 sequences, but to weigh them properly with respect to their frequencies in the non-deduplicated set? I smell it could be done, at least theoretically. We would need this feature when using SPOA as a submodule in other projects. The aim is to avoid redundant work while keeping consensus sequences that make sense.
The text was updated successfully, but these errors were encountered: