-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try uppercase atom names when guessing the mass #2331
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2331 +/- ##
===========================================
+ Coverage 89.89% 89.89% +<.01%
===========================================
Files 173 173
Lines 21778 21784 +6
Branches 2861 2861
===========================================
+ Hits 19577 19583 +6
Misses 1609 1609
Partials 592 592
Continue to review full report at Codecov.
|
I think there's a thing with carbon-alpha which will make this cause lots of tears down the line @orbeckst ? |
Yes, I followed the discussion in Issue #1808. However I agree we probably need to be more careful and I'm happy to re-work this PR in order to solve the problem at a deeper level. Maybe the element names in However, I'm by no means very informed on all the corner cases and therefore I'd be happy to discuss it further and possibly fix it the way you prefer. |
This would rely on the file format being a PDB, which is just one of many formats MDAnalysis supports. Recently #2314 was merged, which I think would get rid of your UserWarning but not in the way you desire: the mass guessed would be carbon's. Probably the best way to get around it would be to refine guessing the element further. The easiest way I can think of would be passing in the residue name and checking it against a list of non-biological-looking names, e.g. the element name itself, or ION, etc... I personally feel that checking against a list of biological-looking names could run into too many issues with modified amino acids. Other ambiguous elements: Hg, He, Ne, Mn (virtual sites) ... |
Thanks for pointing out your PR, @lilyminium. I think you PR was already in my bugfix/metals, but I'll double check on my laptop. I believe it would be nice to have a standard and clean |
My apologies, I misunderstood the issue. @richardjgowers by the point an atom's element is Ca (either through user/file specification or element guessing), it makes sense that the atom's mass should correspond. If it's causing carbon-alpha grief, it would be better to fix the element guessing rather than hack the mass lookup. I agree with you @RMeli, the capitalisation of elements is a bit all over the place and results in unnecessary errors like #1808 and #2265 . I see I've contributed to this confusion by uppercasing |
Just jumping in the discussion here. I would agree with #2331 (comment) that a solution that involves residue names might be useful here. Most of the time resname == atomname for metals (if you strip out numbers/signs and match any case) in the PDB. A check for this would allow for Ca, Mn, etc... to be picked up as metals when they are intended to do so. |
Yeah all the guessing functions can be written to use more than just one other array of information. So really PDB / whatever parsing could be written to parse all the info from a file then do guessing based on combinations of information. Also here masses are guessed from an atomtype, when really the problem is guessing element from an atomtype. (Element to mass is trivial) So maybe a better solution is for PDBParser to have a better element guesser (one that actually ensures that elements are returned too...), then mass->element should just be a trivial dict lookup? |
I agree, this is probably the best option. I think the required steps are the following:
|
I agree with the general discussion that parsers should be assigning elements to atoms because this the place to encode domain-specific knowledge. There's the caveat that coarse grained systems cannot be assigned an element. People will likely also want to have a way to disable any guessing, e.g., so that they can fix-up the topology by themselves. Not having elements is not a big deal, but having no masses will lead to AtomGroup and friends not having a whole bunch of methods (see also #1845) . So we should have a decent way to warn users and give suggestions what to do. Something along the line of: "you can manually assign uniform mass = 1 to all particles with ... or better: use a topology format that contains masses explicitly." |
Thanks for the comments @orbeckst. I agree that the guessing should be controllable by the user. I could give it a try in the next few weeks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is a bandaid (as discussed above) but I think it helps a little
thanks @RMeli ! |
Fixes #2265
This is an attempt to get rid of the
UserWarning: Failed to guess the mass for the following atom types: Ca
warning when PDB files contain metal atoms.It is probably not the best approach, but it's the one that minimize side effects (such as changing
tables.py
). Please let me know if you want to solve issue #2265 in a different way.Changes made in this Pull Request:
PR Checklist