-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search parameters are inconsistent and outdated #1
Comments
I'll get all the parameters as consistent as possible. There are some weird choices, specifically for precursor mass tolerance. I'll update it to 10ppm, still plenty wide for our modern instruments and methods. I'll create and push to a new branch, for testing. Once that's done, I'd appreciate if someone who uses Blender could compare results between the old and new param sets. |
Hi, (1) My assumption is that all search engines may set the default parameters based on their performance (which may have the best result in general). And some parameters are not available to all search engines. That is the main reason why I leave 'the default' parameters for most cases. For a mass tolerance, every data has different level of tolerance so I always set it up manually (as you see in the directory, MSblender is also considering low-res LTQ data which requires broader mass tolerance). (2) Inconsistent search space may be an issue (i.e. PTM allowance in one engine, no PTM for the other). especially if searching larger space with one engine gets better result than others (that is unfair). But in most case, there is no 'definite winner' depending on search space. (3) The original version of MSBlender is tested with 'quite an old instrument' like Orbitrap Classic, so as you mentioned the parameters could be improved by testing new high-resolution instrument data. If you have a test data (i.e. human cell lysates with UPS2 spike-in), I am happy to run the test. Best, Taejoon |
Thanks, Taejoon! Yes, I assumed you tested everything back in the day to optimize the search performance. :) Some of these parameter differences may have originated inadvertently, after you left, during a large reorganization of MSblender. I'm going off of the code in the MSblender_restructure branch, since I think that's what people here are using. With the params for each algorithm mostly hard-coded and all in different locations, it's difficult to keep things consistent. I'm putting my updates to the parameters outlined above in a new branch and will push to production only if search performance improves. Thanks for offering to help test with data from our newer instruments. One more thing. Could you elaborate on your second point? Knowing MSblender better than I, do you think it can be advantageous to allow one engine a larger search space if it helps that engine get more PSMs? I worry those single-engine PSMs will not propagate with high confidence through MSblender and might even negatively impact MSblender's FDR distribution modeling. Best, |
Commits 2396218, 8d26b12, and 2546270 bring more consistency to the search parameters. I also extracted MSGF params from the command line call in runMS2.sh and put them into a text file (in the ./params dir) with comments explaining the parameter options. The runMS2.sh script now loads MSGF params from that file. ./params/MSGFplus_mods.txt is also new and enables user-defined PTMs for MS-GF+. Here is the same table as above, updated with the new parameters.
*Comet allows either 0 or -1,0,2,3 for precursor isotope error, nothing in between. These changes need testing and adjusting before they're adopted or reverted. And I wonder if a precursor max charge of 4 or 5 would generally perform better. |
Andrew -
I'd be happy to test any MSBlender changes you make using the Miller lab
BRD datasets. Just let me know when you're ready for testing and where I
should go for the source code.
- Anna
…On Fri, Aug 24, 2018 at 4:50 PM aphorton ***@***.***> wrote:
Blender relies on PSMs shared between multiple search engines to construct
and apply its FDR scoring model, and it currently uses Comet, XTandem with
k-scoring, and MS-GF+.
For best performance, the set of possible PSMs (peptides and
modifications) for a spectrum should be the same across all search engines.
Inconsistent search spaces can lead to both poorer-scoring true positive
PSMs and better-scoring false positives.
We can't correct this completely, due to differences in the individual
algorithms, but we should aim to make the search spaces as similar as
possible.
Current parameter inconsistencies
Comet XTandem MS-GF+
*precursor mass tolerance* 3 amu 30 ppm 20 ppm
*allowed C13 isotope errors* no 0 to 1 0 to 1
*fragment mass tolerance* low res N/A Q-Exactive HCD (high res)
*precursor max charge* 6 N/A 3
*static modifications* C+57.021464 none C+57
*variable modifications* M+15.9949 M+15.9949 none
Additionally, XTandem searches some more context-specific modifications by
default: known potential single amino acid polymorphisms, protein N-term
acetylation, and protein N-term glutamine (Q) mods of -17Da and -18Da.
These should all be disabled.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHA36sYbHwN_ChpzZLj7OL1tTikjTqRPks5uUHUngaJpZM4WMCZT>
.
|
Blender relies on PSMs shared between multiple search engines to construct and apply its FDR scoring model, and it currently uses Comet, XTandem with k-scoring, and MS-GF+.
For best performance, the set of possible PSMs (peptides and modifications) for a spectrum should be the same across all search engines.
Inconsistent search spaces can lead to both poorer-scoring true positive PSMs and better-scoring false positives.
We can't correct this completely, due to differences in the individual algorithms, but we should aim to make the search spaces as similar as possible.
Current parameter inconsistencies
Additionally, XTandem searches some more context-specific modifications by default: known potential single amino acid polymorphisms, protein N-term acetylation, and protein N-term glutamine (Q) mods of -17Da and -18Da. These should all be disabled.
The text was updated successfully, but these errors were encountered: