Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the TM domain prediction #46

Open
xizhesun opened this issue Jul 12, 2021 · 4 comments
Open

about the TM domain prediction #46

xizhesun opened this issue Jul 12, 2021 · 4 comments

Comments

@xizhesun
Copy link

Dear Darcy,

When I run the pipeline, I found that the complete protein sequences were set as the input of the TMHMM. But I thought the mature protein sequences predicted by signalP would be better than the complete protein sequences. Because the TM domain on signal peptide would have no function. What do you think about this?

Thanks,
Xizhe

@xizhesun
Copy link
Author

For example, a paper using mature protein sequences as the input file of TMHMM.

https://www.frontiersin.org/articles/10.3389/fpls.2014.00098/full

(3) no transmembrane domain was predicted to occur after the cleavage site using Tmhmm v2.0c;

Cheers,
Xizhe

@darcyabjones
Copy link
Member

Hi Xizhe,

Usually the approach we take is to just ignore any TM domains predicted within the SP region.
A lot of the point of the ranking part of the pipeline was to discourage the use of a series of hard filters.
Because the error cumulatively increases as you add more prediction methods.
If we only input mature sequences to TMHMM we can't look for effector-like proteins that lack signal peptides or which are incorrectly not predicted to have signal peptides.

The outputs includes the positions of the predicted TM domains, and also the estimated number of TM bases within the first 60 AAs (which the LTR model uses to decide if it should worry about any TM domains).

I'm not personally in favour of restricting it.

@xizhesun
Copy link
Author

I know what you mean and I agree with you point. There's a better choice, we could combine the mature protein sequences (proteins with SP) and other complete protein sequences (proteins without SP) together as the input file of TMHMM. It will be accurate and not lost any candidates!

@darcyabjones
Copy link
Member

We've had a bit of an internal discussion about this one.
The consensus was that people tend to look at where the TM domain prediction is, and if a predicted SP overlaps it they discount that TM domain.

I can imagine some edge cases where your suggestion might provide some benefit.
But there are a couple of technical issues that it introduces as well (e,g. how should we find a consensus SP cut-site from multiple programs?, when should you take the mature sequence instead of immature? etc).

I think the best way to settle this is to benchmark it and see what happens.
Part of the point of this project was to find out what the best way to combine these tasks was, so i'll be interested to see how it goes.
It'll probably have to wait until we get around to updating the ranking function.

I'll leave this open as a reminder until then and hopefully we'll know in the next major release.

Thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants