Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ivar workflow #293

Merged
merged 4 commits into from
Jan 11, 2024
Merged

update ivar workflow #293

merged 4 commits into from
Jan 11, 2024

Conversation

lldelisle
Copy link
Contributor

I have no legitimity to do this because I do not know this workflow but since the tests are failing I thought it would be good to update it.
I updated the header of pangolin as it is linked to the version. However, the test is still failing because the clade output change. The new output is:

seqName	clade	Nextclade_pango	partiallyAliased	clade_nextstrain	clade_who	clade_display	qc.overallScore	qc.overallStatus	totalSubstitutions	totalDeletions	totalInsertions	totalFrameShifts	totalAminoacidSubstitutions	totalAminoacidDeletions	totalAminoacidInsertions	totalMissing	totalNonACGTNs	totalPcrPrimerChanges	substitutions	deletions	insertions	privateNucMutations.reversionSubstitutions	privateNucMutations.labeledSubstitutions	privateNucMutations.unlabeledSubstitutions	privateNucMutations.totalReversionSubstitutions	privateNucMutations.totalLabeledSubstitutions	privateNucMutations.totalUnlabeledSubstitutions	privateNucMutations.totalPrivateSubstitutions	frameShifts	aaSubstitutions	aaDeletions	aaInsertions	missing	nonACGTNs	pcrPrimerChanges	alignmentScore	alignmentStart	alignmentEnd	coverage	qc.missingData.missingDataThreshold	qc.missingData.score	qc.missingData.status	qc.missingData.totalMissing	qc.mixedSites.mixedSitesThreshold	qc.mixedSites.score	qc.mixedSites.status	qc.mixedSites.totalMixedSites	qc.privateMutations.cutoff	qc.privateMutations.excess	qc.privateMutations.score	qc.privateMutations.status	qc.privateMutations.total	qc.snpClusters.clusteredSNPs	qc.snpClusters.score	qc.snpClusters.status	qc.snpClusters.totalSNPs	qc.frameShifts.frameShifts	qc.frameShifts.totalFrameShifts	qc.frameShifts.frameShiftsIgnored	qc.frameShifts.totalFrameShiftsIgnored	qc.frameShifts.score	qc.frameShifts.status	qc.stopCodons.stopCodons	qc.stopCodons.totalStopCodons	qc.stopCodons.score	qc.stopCodons.status	isReverseComplement	failedGenes	warnings	errors
ERR4970105	20B	B.1.1	B.1.1	20B		20B	1.562500	good	12	0	0	0	7	0	0	121	0	3	C241T,C3037T,G10396T,C13860T,C14408T,A23403G,A23570T,G26062T,A28363G,G28881A,G28882A,G28883C				G10396T|20H,C13860T|20J	A23570T,G26062T,A28363G	0	2	3	5		N:R203K,N:G204R,ORF1b:P314L,ORF3a:G224C,ORF9b:E27G,S:D614G,S:I670L			1-54,29837-29903		ChinaCDC_N_F:G28881A;G28882A;G28883C	89661	0	29903	0.9959535832525165	3000	0	good	121	10	0	good	0	24	3	12.500000	good	11		0	good	0		0		0	0	good		0	0	good	false			

And the test was that there should be '20B 0' in it. There is '20B 1'. Could someone from the field confirm I can change the test?

@lldelisle
Copy link
Contributor Author

@pvanheus if you are still around.

@wm75
Copy link
Contributor

wm75 commented Nov 22, 2023

@lldelisle for the pangolin output updating the header is definitely a good thing.
The change from 0 to 1 in the nextclade output looks ok, too, though I'm not 100% sure, which columns that's coming from (clade and qc.overallScore?).

@wm75
Copy link
Contributor

wm75 commented Nov 22, 2023

@pvanheus the potentially important changes here are:

  • pangolin: the current proposed version would run in usher mode (not pangolearn anymore), which should be a very good update, but also switches to using the pangolin-data and constellations versions shipping with the tool (instead of downloading them from the web like the old version did). That later change kind of brings nextclade and pangolin out of "sync" because nextclade still gets the latest data.

  • ivar trim has grown a new param in the latest version ffor length filtering of reads after trimming. The way it's used in the updated WF version, it's eliminating reads with less than 50% length after trimming compared to the first 1000 reads in the file; should normally be ok, but could change results slightly

  • ivar consensus has a new freq threshold for indels (on top of the one for SNPs). The proposed version fixes it at 0.8 (not a bad value, but maybe it should also become a WF input param?).

@lldelisle
Copy link
Contributor Author

@lldelisle for the pangolin output updating the header is definitely a good thing. The change from 0 to 1 in the nextclade output looks ok, too, though I'm not 100% sure, which columns that's coming from (clade and qc.overallScore?).

The 0 -> 1 is in the clade_display (the value is 1.5625)

@mvdbeek mvdbeek merged commit a0775d2 into galaxyproject:main Jan 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants