Search parameters are inconsistent and outdated #1

aphorton · 2018-08-24T21:48:13Z

Blender relies on PSMs shared between multiple search engines to construct and apply its FDR scoring model, and it currently uses Comet, XTandem with k-scoring, and MS-GF+.

For best performance, the set of possible PSMs (peptides and modifications) for a spectrum should be the same across all search engines.

Inconsistent search spaces can lead to both poorer-scoring true positive PSMs and better-scoring false positives.

We can't correct this completely, due to differences in the individual algorithms, but we should aim to make the search spaces as similar as possible.

Current parameter inconsistencies

	Comet	XTandem	MS-GF+
precursor mass tolerance	3 amu	30 ppm	20 ppm
allowed C13 isotope errors	no	0 to 2	0 to 1
fragment mass tolerance	low res	N/A	Q-Exactive HCD (high res)
precursor max charge	6	N/A	3
static modifications	C+57.021464	none	C+57
variable modifications	M+15.9949	M+15.9949	none

Additionally, XTandem searches some more context-specific modifications by default: known potential single amino acid polymorphisms, protein N-term acetylation, and protein N-term glutamine (Q) mods of -17Da and -18Da. These should all be disabled.

aphorton · 2018-08-24T21:55:08Z

I'll get all the parameters as consistent as possible.

There are some weird choices, specifically for precursor mass tolerance. I'll update it to 10ppm, still plenty wide for our modern instruments and methods.

I'll create and push to a new branch, for testing. Once that's done, I'd appreciate if someone who uses Blender could compare results between the old and new param sets.

taejoon · 2018-08-25T02:16:22Z

Hi,

(1) My assumption is that all search engines may set the default parameters based on their performance (which may have the best result in general). And some parameters are not available to all search engines. That is the main reason why I leave 'the default' parameters for most cases. For a mass tolerance, every data has different level of tolerance so I always set it up manually (as you see in the directory, MSblender is also considering low-res LTQ data which requires broader mass tolerance).

(2) Inconsistent search space may be an issue (i.e. PTM allowance in one engine, no PTM for the other). especially if searching larger space with one engine gets better result than others (that is unfair). But in most case, there is no 'definite winner' depending on search space.

(3) The original version of MSBlender is tested with 'quite an old instrument' like Orbitrap Classic, so as you mentioned the parameters could be improved by testing new high-resolution instrument data. If you have a test data (i.e. human cell lysates with UPS2 spike-in), I am happy to run the test.

Best,

Taejoon

aphorton · 2018-08-25T03:58:38Z

Thanks, Taejoon!

Yes, I assumed you tested everything back in the day to optimize the search performance. :)

Some of these parameter differences may have originated inadvertently, after you left, during a large reorganization of MSblender. I'm going off of the code in the MSblender_restructure branch, since I think that's what people here are using. With the params for each algorithm mostly hard-coded and all in different locations, it's difficult to keep things consistent.

I'm putting my updates to the parameters outlined above in a new branch and will push to production only if search performance improves. Thanks for offering to help test with data from our newer instruments.

One more thing. Could you elaborate on your second point? Knowing MSblender better than I, do you think it can be advantageous to allow one engine a larger search space if it helps that engine get more PSMs? I worry those single-engine PSMs will not propagate with high confidence through MSblender and might even negatively impact MSblender's FDR distribution modeling.

Best,
Andrew

aphorton · 2018-08-25T05:28:43Z

Commits 2396218, 8d26b12, and 2546270 bring more consistency to the search parameters.

I also extracted MSGF params from the command line call in runMS2.sh and put them into a text file (in the ./params dir) with comments explaining the parameter options. The runMS2.sh script now loads MSGF params from that file.

./params/MSGFplus_mods.txt is also new and enables user-defined PTMs for MS-GF+.

Here is the same table as above, updated with the new parameters.

	Comet	XTandem	MS-GF+
precursor mass tolerance	10 ppm	10 ppm	10 ppm
allowed C13 isotope errors	-1 to 3*	0 to 2	0 to 2
fragment mass tolerance	low res	N/A	low res
precursor max charge	6	N/A	6
static modifications	C+57.021464	C+57.021464	C+57.021464
variable modifications	M+15.9949	M+15.9949	M+15.9949

*Comet allows either 0 or -1,0,2,3 for precursor isotope error, nothing in between.

These changes need testing and adjusting before they're adopted or reverted. And I wonder if a precursor max charge of 4 or 5 would generally perform better.

abattenhouse · 2018-08-25T20:43:14Z

Andrew - I'd be happy to test any MSBlender changes you make using the Miller lab BRD datasets. Just let me know when you're ready for testing and where I should go for the source code. - Anna

…

On Fri, Aug 24, 2018 at 4:50 PM aphorton ***@***.***> wrote: Blender relies on PSMs shared between multiple search engines to construct and apply its FDR scoring model, and it currently uses Comet, XTandem with k-scoring, and MS-GF+. For best performance, the set of possible PSMs (peptides and modifications) for a spectrum should be the same across all search engines. Inconsistent search spaces can lead to both poorer-scoring true positive PSMs and better-scoring false positives. We can't correct this completely, due to differences in the individual algorithms, but we should aim to make the search spaces as similar as possible. Current parameter inconsistencies Comet XTandem MS-GF+ *precursor mass tolerance* 3 amu 30 ppm 20 ppm *allowed C13 isotope errors* no 0 to 1 0 to 1 *fragment mass tolerance* low res N/A Q-Exactive HCD (high res) *precursor max charge* 6 N/A 3 *static modifications* C+57.021464 none C+57 *variable modifications* M+15.9949 M+15.9949 none Additionally, XTandem searches some more context-specific modifications by default: known potential single amino acid polymorphisms, protein N-term acetylation, and protein N-term glutamine (Q) mods of -17Da and -18Da. These should all be disabled. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHA36sYbHwN_ChpzZLj7OL1tTikjTqRPks5uUHUngaJpZM4WMCZT> .

clairemcwhite · 2018-08-28T17:00:33Z

I ran a fractionation with the MSblender_restructure branch and the new Consistent_params branch. There's a pretty consistent 5-10% increase in number of unique peptides per fraction. It's is lower in a few fractions, which is something we should watch out for.

clairemcwhite · 2018-08-28T17:15:35Z

Same plot counting unique proteins in each prot_count .group file.

aphorton self-assigned this Aug 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search parameters are inconsistent and outdated #1

Search parameters are inconsistent and outdated #1

aphorton commented Aug 24, 2018 •

edited

Loading

aphorton commented Aug 24, 2018

taejoon commented Aug 25, 2018

aphorton commented Aug 25, 2018

aphorton commented Aug 25, 2018

abattenhouse commented Aug 25, 2018 via email

clairemcwhite commented Aug 28, 2018

clairemcwhite commented Aug 28, 2018

Search parameters are inconsistent and outdated #1

Search parameters are inconsistent and outdated #1

Comments

aphorton commented Aug 24, 2018 • edited Loading

Current parameter inconsistencies

aphorton commented Aug 24, 2018

taejoon commented Aug 25, 2018

aphorton commented Aug 25, 2018

aphorton commented Aug 25, 2018

abattenhouse commented Aug 25, 2018 via email

clairemcwhite commented Aug 28, 2018

clairemcwhite commented Aug 28, 2018

aphorton commented Aug 24, 2018 •

edited

Loading