-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hi!
Breseq is a great tool! thank you for all your work.
I'm not sure if this reported behavior is a bug or a feature, so this is a question with a potential bug report.
question/issue: When using using GDTools COMPARE on multiple samples to produce TSV file output, I will usually get the same (40) headers, but occasionally get a different number. Some headers go missing (e.g. 'new_seq'), while new ones appear (e.g. 'repeat_length'). This behavior is somewhat unpredictable (i.e. does not always correspond to types of mutations present; lots of other headers always appear even when column is empty) and thus perhaps is unintended? It definitely makes parsing the table problematic. Here are a couple of different header row examples:
aa_new_seq aa_position aa_ref_seq clone codon_new_seq codon_number codon_position codon_position_is_indeterminate codon_ref_seq gene_name gene_position gene_product gene_strand genes_inactivated genes_overlapping genes_promoter locus_tag locus_tags_inactivated locus_tags_overlapping locus_tags_promoter mutation_category mutator_status new_read_count new_read_count_basis population position position_end position_start ref_read_count ref_read_count_basis ref_seq repeat_length repeat_new_copies repeat_ref_copies repeat_seq seq_id size snp_type time title transl_table treatment type
PM_10340/PM_10350 intergenic (+150/+5) ester cyclase/hypothetical protein >/< PM_10340/PM_10350 small_indel 12 1 964623 964663 964623 5 2 41-bp PM_chr1 41 -1 output DEL
PM39400_25310 coding (99-107/837 nt) DNA-directed RNA polymerase beta subunit > PM39400_25310 PM39400_25310 PM_25310 small_indel 79 1 2361336 2361344 2361336 11 2 AGTAGCCCC 9 2 3 AGTAGCCCC PM_chr1 9 -1 output DEL
aa_new_seq aa_position aa_ref_seq clone codon_new_seq codon_number codon_position codon_position_is_indeterminate codon_ref_seq gene_name gene_position gene_product gene_strand genes_inactivated genes_overlapping genes_promoter locus_tag locus_tags_inactivated locus_tags_overlapping locus_tags_promoter mutation_category mutator_status new_read_count new_read_count_basis new_seq population position position_end position_start ref_read_count ref_read_count_basis ref_seq seq_id size snp_type time title transl_table treatment type
T 263 T ACG 263 3 ACA PM39400_71000 789 IS200/IS605 family transposase ISBce3 < PM39400_71000 PM39400_71000 PM39400_71000 snp_synonymous 9 1 C 382 382 382 0 1 T PM39400_pla12 synonymous -1 PM39400 1 SNP
PM39400_10340/PM39400_10350 intergenic (+150/+5) ester cyclase/hypothetical protein >/< PM39400_10340/PM39400_10350 small_indel 10 1 964623 964663 964623 1 2 41-bp PM39400_chr1 41 -1 PE39400-G4 DEL
Thanks for your help!