Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search parameters are inconsistent and outdated #1

Open
aphorton opened this issue Aug 24, 2018 · 7 comments
Open

Search parameters are inconsistent and outdated #1

aphorton opened this issue Aug 24, 2018 · 7 comments
Assignees

Comments

@aphorton
Copy link
Contributor

aphorton commented Aug 24, 2018

Blender relies on PSMs shared between multiple search engines to construct and apply its FDR scoring model, and it currently uses Comet, XTandem with k-scoring, and MS-GF+.

For best performance, the set of possible PSMs (peptides and modifications) for a spectrum should be the same across all search engines.

Inconsistent search spaces can lead to both poorer-scoring true positive PSMs and better-scoring false positives.

We can't correct this completely, due to differences in the individual algorithms, but we should aim to make the search spaces as similar as possible.

Current parameter inconsistencies

Comet XTandem MS-GF+
precursor mass tolerance 3 amu 30 ppm 20 ppm
allowed C13 isotope errors no 0 to 2 0 to 1
fragment mass tolerance low res N/A Q-Exactive HCD (high res)
precursor max charge 6 N/A 3
static modifications C+57.021464 none C+57
variable modifications M+15.9949 M+15.9949 none

Additionally, XTandem searches some more context-specific modifications by default: known potential single amino acid polymorphisms, protein N-term acetylation, and protein N-term glutamine (Q) mods of -17Da and -18Da. These should all be disabled.

@aphorton
Copy link
Contributor Author

I'll get all the parameters as consistent as possible.

There are some weird choices, specifically for precursor mass tolerance. I'll update it to 10ppm, still plenty wide for our modern instruments and methods.

I'll create and push to a new branch, for testing. Once that's done, I'd appreciate if someone who uses Blender could compare results between the old and new param sets.

@aphorton aphorton self-assigned this Aug 24, 2018
@taejoon
Copy link

taejoon commented Aug 25, 2018

Hi,

(1) My assumption is that all search engines may set the default parameters based on their performance (which may have the best result in general). And some parameters are not available to all search engines. That is the main reason why I leave 'the default' parameters for most cases. For a mass tolerance, every data has different level of tolerance so I always set it up manually (as you see in the directory, MSblender is also considering low-res LTQ data which requires broader mass tolerance).

(2) Inconsistent search space may be an issue (i.e. PTM allowance in one engine, no PTM for the other). especially if searching larger space with one engine gets better result than others (that is unfair). But in most case, there is no 'definite winner' depending on search space.

(3) The original version of MSBlender is tested with 'quite an old instrument' like Orbitrap Classic, so as you mentioned the parameters could be improved by testing new high-resolution instrument data. If you have a test data (i.e. human cell lysates with UPS2 spike-in), I am happy to run the test.

Best,

Taejoon

@aphorton
Copy link
Contributor Author

Thanks, Taejoon!

Yes, I assumed you tested everything back in the day to optimize the search performance. :)

Some of these parameter differences may have originated inadvertently, after you left, during a large reorganization of MSblender. I'm going off of the code in the MSblender_restructure branch, since I think that's what people here are using. With the params for each algorithm mostly hard-coded and all in different locations, it's difficult to keep things consistent.

I'm putting my updates to the parameters outlined above in a new branch and will push to production only if search performance improves. Thanks for offering to help test with data from our newer instruments.

One more thing. Could you elaborate on your second point? Knowing MSblender better than I, do you think it can be advantageous to allow one engine a larger search space if it helps that engine get more PSMs? I worry those single-engine PSMs will not propagate with high confidence through MSblender and might even negatively impact MSblender's FDR distribution modeling.

Best,
Andrew

@aphorton
Copy link
Contributor Author

Commits 2396218, 8d26b12, and 2546270 bring more consistency to the search parameters.

I also extracted MSGF params from the command line call in runMS2.sh and put them into a text file (in the ./params dir) with comments explaining the parameter options. The runMS2.sh script now loads MSGF params from that file.

./params/MSGFplus_mods.txt is also new and enables user-defined PTMs for MS-GF+.

Here is the same table as above, updated with the new parameters.

Comet XTandem MS-GF+
precursor mass tolerance 10 ppm 10 ppm 10 ppm
allowed C13 isotope errors -1 to 3* 0 to 2 0 to 2
fragment mass tolerance low res N/A low res
precursor max charge 6 N/A 6
static modifications C+57.021464 C+57.021464 C+57.021464
variable modifications M+15.9949 M+15.9949 M+15.9949

*Comet allows either 0 or -1,0,2,3 for precursor isotope error, nothing in between.

These changes need testing and adjusting before they're adopted or reverted. And I wonder if a precursor max charge of 4 or 5 would generally perform better.

@abattenhouse
Copy link
Contributor

abattenhouse commented Aug 25, 2018 via email

@clairemcwhite
Copy link
Member

I ran a fractionation with the MSblender_restructure branch and the new Consistent_params branch. There's a pretty consistent 5-10% increase in number of unique peptides per fraction. It's is lower in a few fractions, which is something we should watch out for.

image

@clairemcwhite
Copy link
Member

image

Same plot counting unique proteins in each prot_count .group file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants