Skip to content

GDTools COMPARE TSV output headers can vary #395

@jvera888

Description

@jvera888

Hi!

Breseq is a great tool! thank you for all your work.

I'm not sure if this reported behavior is a bug or a feature, so this is a question with a potential bug report.

question/issue: When using using GDTools COMPARE on multiple samples to produce TSV file output, I will usually get the same (40) headers, but occasionally get a different number. Some headers go missing (e.g. 'new_seq'), while new ones appear (e.g. 'repeat_length'). This behavior is somewhat unpredictable (i.e. does not always correspond to types of mutations present; lots of other headers always appear even when column is empty) and thus perhaps is unintended? It definitely makes parsing the table problematic. Here are a couple of different header row examples:

aa_new_seq	aa_position	aa_ref_seq	clone	codon_new_seq	codon_number	codon_position	codon_position_is_indeterminate	codon_ref_seq	gene_name	gene_position	gene_product	gene_strand	genes_inactivated	genes_overlapping	genes_promoter	locus_tag	locus_tags_inactivated	locus_tags_overlapping	locus_tags_promoter	mutation_category	mutator_status	new_read_count	new_read_count_basis	population	position	position_end	position_start	ref_read_count	ref_read_count_basis	ref_seq	repeat_length	repeat_new_copies	repeat_ref_copies	repeat_seq	seq_id	size	snp_type	time	title	transl_table	treatment	type
									PM_10340/PM_10350	intergenic (+150/+5)	ester cyclase/hypothetical protein	>/<				PM_10340/PM_10350				small_indel		12	1		964623	964663	964623	5	2	41-bp					PM_chr1	41		-1	output			DEL
									PM39400_25310	coding (99-107/837 nt)	DNA-directed RNA polymerase beta subunit	>	PM39400_25310			PM39400_25310	PM_25310			small_indel		79	1		2361336	2361344	2361336	11	2	AGTAGCCCC	9	2	3	AGTAGCCCC	PM_chr1	9		-1	output			DEL
aa_new_seq	aa_position	aa_ref_seq	clone	codon_new_seq	codon_number	codon_position	codon_position_is_indeterminate	codon_ref_seq	gene_name	gene_position	gene_product	gene_strand	genes_inactivated	genes_overlapping	genes_promoter	locus_tag	locus_tags_inactivated	locus_tags_overlapping	locus_tags_promoter	mutation_category	mutator_status	new_read_count	new_read_count_basis	new_seq	population	position	position_end	position_start	ref_read_count	ref_read_count_basis	ref_seq	seq_id	size	snp_type	time	title	transl_table	treatment	type
T	263	T		ACG	263	3		ACA	PM39400_71000	789	IS200/IS605 family transposase ISBce3	<		PM39400_71000		PM39400_71000		PM39400_71000		snp_synonymous		9	1	C		382	382	382	0	1	T	PM39400_pla12		synonymous	-1	PM39400	1		SNP
									PM39400_10340/PM39400_10350	intergenic (+150/+5)	ester cyclase/hypothetical protein	>/<				PM39400_10340/PM39400_10350				small_indel		10	1			964623	964663	964623	1	2	41-bp	PM39400_chr1	41		-1	PE39400-G4			DEL

Thanks for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions