-
Notifications
You must be signed in to change notification settings - Fork 2
Channels #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channels #7
Conversation
Please move non-standard imports inside the functions and use LOGGER to check whether the package is installed. If not, provide information on which particular package needs to be installed to use this function. If we have those imports in the current form, I immediately have an error when importing ProDy (see below). In [1]: from prody import *ModuleNotFoundError Traceback (most recent call last) File ~/test_ProDy/ET_ProDy/ProDy/prody/init.py:88 File ~/test_ProDy/ET_ProDy/ProDy/prody/proteins/init.py:253 File ~/test_ProDy/ET_ProDy/ProDy/prody/proteins/interactions.py:37 ModuleNotFoundError: No module named 'open3d' |
I would suggest changing the name of the main function detect_channels to calcChannels. ProDy is using calcXXX (to compute something) or getXXX (if we want raw numbers, etc.). Here: in Prody we are using "atoms" as the parameter with protein structure. See here: |
Usually, in ProDy, some results are returned. See below...when I am calling channels, I should have some results on which I could do some further analysis.. for example, to check the volume of the channels or which residues are forming the channel, etc. We should be able to have access to each channel among those 9 which were identified. Dictionary maybe? In [10]: p = parsePDB('1tqn.pdb') In [11]: p_prot = p.select('protein') In [12]: p_prot In [13]: channels = detect_channels(p_prot) In [14]: channels |
visualizer='channels' - that looks nice. I don't understand how to obtain stl file. Is it created automatically when not set to None? When I did this: I received only channels without protein structure on the visualization and stl file was not created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the comments above, the channel prediction looks good. I am receiving similar results. When the comments will be applied I will test it again on different structures and with different parameters.
The stl file is created from the run_vmd_script command or the tcl script itself. 'stl_path' is an input parameter with a 3D model of the NewCartoon representation of the protein. When the parameter is set to None, instead of the protein, a triangle mesh approximating the protein surface will be used. When the program detects an stl file with a protein model, it automatically displays the protein instead of the approximated mesh. |
New changes: protein_path = "C:\Users\User\Desktop\1tqn.pdb" protein = prody.parsePDB(protein_path) I didnt yet add volume of channels in getChannelsParameters because it will take a bit longer to implement. |
Thank you for the improvements. Now I am testing the functions again: In [9]: channels = calcChannels(atoms) In [10]: channels It's nice that we have some results now. Can we somehow distinguish different channels? We have 10 which were detected in the structure I tested. Also, "No output path given." I would change it to LOGGER info. Something like: To obtain a PDB file with the results please use...this and this parameter, etc. (or some other information to explain more what to do to actually obtain some visualization of the channels and not only the number. |
To distinguish between channels we can use getChannelsParameters and check the lengths. Parameters are outputted to a list which order corresponds to the order of channels. For example, if we want to get only the longest channel, we would check the lengths of channels and get an index of the longest channels (for example 3), and then get the 3rd channel by doing channels[3] and input it to the visualizer by prody.calcChannels(protein, vis_channels = channels[3], surface = surface). I dont really have an idea to distinguish the channels in a different way. |
Hi James @jamesmkrieger |
@Erykiusz could you provide an example line by line? (even just commands) Also, can we have access to the atoms (=channels) that are saved as PDB files? It would be very useful to have coordinates of those atoms and then we can use exwithin in select to provide the residues which are forming the channel. And it would be good to have those atoms separated by channels. |
To get only selected channels we can do this: Calculations: Now we got a list of lengths of the channels. Lets say we want to see only 3rd and 4th channel. So we do: Now we visualize: I will add the option to see the atoms of channels next time I will be working on it. |
Now it is working fine. Thank you. |
Great. I’ll have a look. I don’t think it needs to go in a class. We have many functionalities in prody that are just functions too. |
I wouldn’t use LOGGER to report modules not installed. We need to raise an ImportError inside the function saying that the user needs that module to run the function |
prody/proteins/interactions.py
Outdated
from scipy.spatial import Voronoi, Delaunay | ||
from pathlib import Path | ||
|
||
if not check_and_import('heapq'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check packages before trying to import them to not get errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably also do this with a loop over a list of names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a general function in prody.utilities and I was thinking of adding import_command inside too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have the following code that seems to work:
def check_and_import(package_name, import_command=None):
"""Check for package and import it if possible and return **True**.
Otherwise, return **False
:arg package_name: name of package
:type package_name: str
:arg import_command: optional command to import submodules or with an alias
default **None** means use "import {0}".format(package_name)
:type import_command: None, str
"""
if not isinstance(package_name, str):
raise TypeError('package_name should be a string')
if not isinstance(import_command, (type(None), str)):
raise TypeError('import_command should be None or a string')
import importlib.util
if importlib.util.find_spec(package_name) is None:
LOGGER.warn(f"Package '{package_name}' is not installed. Please install it to use this function.")
return False
if import_command is None:
exec('import {0}'.format(package_name))
else:
exec(import_command)
return True
required = [['heapq', None], ['open3d', 'import open3d as o3d'], ['collections', 'from collections import deque'],
['scipy', 'from scipy.interpolate import CubicSpline; from scipy.spatial import Voronoi, Delaunay'],
['pathlib', 'from pathlib import Path']]
missing = []
errorMsg = None
for (name, command) in required:
if not check_and_import(name, command):
missing.append(name)
if errorMsg is None:
errorMsg = 'To run calcChannels, please install {0}'.format(missing[0])
else:
errorMsg += ', ' + name
if len(missing) > 0:
if len(missing) > 1:
errorMsg = ', '.join(errorMsg.split(', ')[:-1]) + ' and ' + errorMsg.split(', ')[-1]
raise ImportError(errorMsg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we then don't need the logger step at all. The current output is as follows:
channels, surface = prody.calcChannels(protein)
@> WARNING Package 'open3d' is not installed. Please install it to use this function.
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[54], line 1
----> 1 channels, surface = prody.calcChannels(protein)
File ~/software/scipion3/software/em/prody-github/ProDy/prody/proteins/interactions.py:4511, in calcChannels(atoms, r1, r2, min_depth, bottleneck, sparsity, output_path, visualizer, stl_path, vis_channels, surface)
4509 if len(missing) > 1:
4510 errorMsg = ', '.join(errorMsg.split(', ')[:-1]) + ' and ' + errorMsg.split(', ')[-1]
-> 4511 raise ImportError(errorMsg)
4513 class State:
4514 def __init__(self, simplices, neighbors, vertices):
ImportError: To run calcChannels, please install open3d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it looks like importing inside the check function doesn't work. Sorry.
Now my code is the following:
def check_and_import(package_name):
"""Check for package and import it if possible and return **True**.
Otherwise, return **False
:arg package_name: name of package
:type package_name: str
:arg import_command: optional command to import submodules or with an alias
default **None** means use "import {0}".format(package_name)
:type import_command: None, str
"""
if not isinstance(package_name, str):
raise TypeError('package_name should be a string')
import importlib.util
if importlib.util.find_spec(package_name) is None:
LOGGER.warn(f"Package '{package_name}' is not installed. Please install it to use this function.")
return False
return True
required = ['heapq', 'open3d', 'collections', 'scipy', 'pathlib']
missing = []
errorMsg = None
for name in required:
if not check_and_import(name):
missing.append(name)
if errorMsg is None:
errorMsg = 'To run calcChannels, please install {0}'.format(missing[0])
else:
errorMsg += ', ' + name
if len(missing) > 0:
if len(missing) > 1:
errorMsg = ', '.join(errorMsg.split(', ')[:-1]) + ' and ' + errorMsg.split(', ')[-1]
raise ImportError(errorMsg)
import heapq
import open3d as o3d
from collections import deque
from scipy.interpolate import CubicSpline
from scipy.spatial import Voronoi, Delaunay
from pathlib import Path
Ok. I can see the benefit of logging all the packages needed and raising one error at the end rather than going through one by one |
Probably this should go in its own file not the interactions one. If anything, it’s more related to watfinder and it would be nice to have a way of comparing results and checking for waters along these channels. It would be nice to include ions too |
All atoms that are not 'HETATM' are analyzed. Depends on which one you provide as a parsed one. If ions have 'ATOMS' instead of 'HETATM' they should be analyzed too. I can check that later and confirm. |
James, you are right. It will fit better to waterbridges.py. |
Ions are usually HETATM. We also don’t want them treated like protein atoms for defining the channels. We’d want to look at them afterwards to see if they occupy the channels. |
When channels are detected they are all automatically merged to create model for the viewer and the output pdb file. If you would lower the sparsity you would get more channels that overlap, and if you would set it to 0, all the surface points of the molecule will be used as the end points of channels, resulting in many channels that highly overlap. |
ok, thanks |
This kind of information is very useful. At some point, we will have to include them somewhere. When the tool will be ready, I will prepare a tutorial similar to those: It would be very useful to be able to merge some selected channels (for example channel 1 and channel 9) to further compute values for them, residues that are forming the channel, etc. as for one merged channel. |
New changes: I didn't yet tested calcChannelsMultipleFrames with proper files so there might be errors. @karolamik13 accessing the atoms of channels should be done to get artificial atoms making the path of the channels (hydrogen atoms from resuting pdb file) or atoms that make the surface of channels? |
@Erykiusz "atoms that make the surface of channels" are more useful, but could we have both? Under Lunux, vmd is usually here: I tested different functions, and they are working well except for calcChannelsMultipleFrames() where I had this error: File ~/anaconda3/envs/prody/lib/python3.9/site-packages/ProDy-2.5.0-py3.9-linux-x86_64.egg/prody/proteins/interactions.py:4872, in calcChannelsMultipleFrames(atoms, trajectory, output_path, **kwargs) AttributeError: 'ChannelCalculator' object has no attribute 'calcChannels'_ I am sending multi-model PDB and trajectory via e-mail, which you can use for tests. |
Now it should be working fine |
Yes, now multi-model PDB and trajectory analysis are working fine. Great job! Several questions:
(2) Multi-model PDB / trajectories:
After addressing those things, I will merge the pull request. |
(1) Yes, this is now in progress. Both residues forming the channels and merging them are in my queue. |
Combining the channels: We may have two channels near each other, and we might want to join them, visualize them, and compute the volume and residues that are participating in the formation of that joint channel (=two channels or more, for example, channel[1] and channel[3]). Unless you think there is no point in doing that because each channel shows the whole pathway or for other reasons. For both single PDB and trajectory, I was thinking the same thing. |
Ok, I will work on that |
I guess there should be some criteria that are different for combining channels in a trajectory e.g. if they are close in one frame then they could be combined in all frames. Maybe we end up with a channel that has two disconnected parts in some frames and that would be interesting to see. |
It could be nice to have a way to turn channels on and off in the viewer, so we could open a few of them and then focus on particular ones. It would also be great if we can open a few frames and turn them on and off to have everything in the same viewer to cross compare rather than having to open and close the viewers in between or open multiple instances. Is it possible to continue typing in the terminal while a viewer is open to perform further analysis based on what we see and open other viewers? Also, what quantitative measures do we have like width along the channel with some connection to lining residues and can we compare these across frames? Could we maybe make a plot of width near a particular residue over time for example? |
I've added getChannelAtoms which returns an AtomGroup object of dummy atoms forming the channels. The user can also choose a 'resolution' which is basically a number of samples multipled by the length of the channel. The default value is 5 which is the same as for outputing pdb file. When calling getChannelAtoms we get an object that merges all the channels into one. Also, I don't really see the need for merging channels because the values from getChannelAtoms and getChannelParameters are from the list of channels anyway. If we would want to get values for specific channels we could just simply manipulate elements of the channel list. For example: And then we could get the parameters and atoms making all the channels with: We could also visualize only the channels of interest: Another thing is that channels are saved as CubicSplines and merging them would result in very strange results. If the merging of channels would be necessery, I would think of a solution, but I think the method I described above should be enough, but correct me if I'm wrong. I also changed getChannelsParameters to getChannelParameters. When it comes to viewer, I didn't experiment with changing it while it is running and I don't know if it is possible. Opening multiple viewers is not possible right now. We can compare the quantitive measures from getChannelParameters from different frames and for now it is an only option. |
Ok, makes sense. Thanks |
I tested everything, and it works great! Two suggestions: (2) Can you create a function like getChannelAtoms() that will return protein + channel coordinates (or add a parameter to include it in getChannelAtoms())? |
I've added a parameter 'separate' to calcChannels and calcChannelsMultipleFrames that enables writing separate output files for each channel. Examples: Also getChannelAtoms now has a parameter 'protein' that when set to an AtomGroup object (parsed protein) will merge itsef with the channels. Example: |
Works great. We can also add a function to obtain residues that are forming the channel, but it can be done later. for channel[1]: Once the checks are done successfully (at least for Python 3.8 and higher), I will accept the pull request. |
Ideally we should remove the fstrings from interactions.py so that the python 2 import works too |
Yes, you are right. |
Are the f-strings in my code causing problems? Because there was only one f-string in checkAndImport, which I removed. |
If you look here: You will see that also that line is problematic: and probably would work if you make it like the one below so that it would be compatible with Python 2.7. But we might have more of those in a code. It works for Python 3.8 and higher, so I am happy with the code. @jamesmkrieger maybe it would be a good time to finally drop Python 2.7? |
We would have to change how we build the website, but yes, it would be great if we could |
No description provided.