Library Charge Support #433

j-wags · 2019-10-08T20:56:41Z

Close Implement Library Charges #25
Fix Topology.add_molecule has undefined behavior if given an existing molecule with different atom indexing #434
Add common base class (_NonbondedHandler) for ParameterHandlers that deal with OpenMM's NonbondedForce objects
Make _NonbondedHandlers know whether a previous handler has assigned charges to a molecule (and therefore to skip it)
- I ended up having _NonbondedHandlers attach this information to the input Topology during system creation (by creating and modifying a new dict topology._ref_mol_to_charge_method). This is performed using _NonbondedHandler.mark_charges_assigned(self, ref_mol, topology) and NonbondedHandler.check_charges_assigned(ref_mol, topology). This is not an ideal solution, since the Topology shouldn't need to know anything about the parameters that are assigned. But until we have our own System class, there is no way to store this information in OpenMM's System class, and the Topology is the only other data structure we have continuous access to during system creation.
Add tests
- Especially relevant was the addition of test_charge_method_hierarchy in test_forcefield -- Basically, a molecule should get charges from the charge_from_molecules kwarg, then LibraryCharges, then AM1-BCC. The new hierarchy test is fairly exhaustive in ensuring that this is the actual behavior.
Update docstrings/documentation, if applicable
Update changelog

Status

Ready for review
Ready for merge

lgtm-com · 2019-10-10T06:06:04Z

This pull request introduces 2 alerts when merging 9ca3018 into d2e7173 - view on LGTM.com

new alerts:

1 for Unused local variable
1 for Unreachable code

codecov-io · 2019-10-10T11:39:06Z

Codecov Report

Merging #433 into master will increase coverage by 0.73%.
The diff coverage is 93.64%.

proteneer · 2019-10-11T14:04:24Z

Thanks for doing this!

Might make sense to add a test of Library Charges in conjunction with multiple overlapping BCCs. For us down the road, training to BCCs probably makes more sense than the charges directly.
Will there actually be a set of parameters released? (eg. for proteins)

j-wags · 2019-10-11T14:30:59Z

@proteneer

Might make sense to add a test of Library Charges in conjunction with multiple overlapping BCCs. For us down the road, training to BCCs probably makes more sense than the charges directly.

That's a good point. I'd planned on doing library charges and BCCs separately, but I've had to refresh my knowledge of our charging systems during this work, so it may make sense to do them simultaneously. My goal is to get the library charge functionality+tests working and cleaned up by this afternoon. If I can do that, I'll see if I could get BCCs in early next week.

Will there actually be a set of parameters released? (eg. for proteins)

No. During the in-person meeting, we discussed which order we should add new functionality, and time estimates for each. Missing toolkit functionality is blocking for several OFF subprojects. I pointed out the massive GBSA time overrun was largely due to my writing and validating the GBSA parameter files, not the GBSAHandler itself. So, in the interest of time, LibraryCharges and BCCs will come out without "real" parameters (except for dummy files used for tests).

I know Caitlin had been working on porting an AMBER protein FF's charges to SMIRNOFF format, but the LibraryChargeHandler wasn't ready before she left, so that never got finished.

proteneer · 2019-10-11T14:44:17Z

Thanks - this sounds good.

jchodera · 2019-10-11T15:00:09Z

@j-wags : Is there an issue we can link to that describes how this functionality will work, and the order in which charge assignment schemes will operate? Or has the spec been updated to describe this in detail?

j-wags · 2019-10-11T15:32:21Z

@jchodera

Is there an issue we can link to that describes how this functionality will work, and the order in which charge assignment schemes will operate? Or has the spec been updated to describe this in detail?

I can add the current behavior to the spec if it's not there already. Basically, the priority will be charge_from_molecules > LibraryCharges > ToolkitAM1BCC. BCCs will be applied to the resulting charges afterwards, if present.

The way that the object model works now creates some problems, like if a library charge parameter DOES match a molecule, but sets the charges for a molecule to be all zeroes, then ToolkitAM1BCC won't know that it matched at all, and will run AM1-BCC. This is because the ParameterHandlers are completely modular, and have to infer the history of the charging process solely by the information in the OpenMM System when their turn comes up.

So, we'll need to do a SMIRNOFF spec change in the future, where the charge-generating handlers lose some modularity. This will likely involve making the charge-generating handlers be defined in a subsection within the Electrostatics tag, however this needs to be reconciled against the fact that VirtualSites will have both charge and LJ parameters. Decisions on that front also tie into the architecture of our System class. But this is a restatement of concerns in #310 and #340.

Anyway, we'll be debating the spec change for several months in the best case, so since there are people funded to do this science now, it makes sense to put out hard-coded functionality in the short term.

jchodera · 2019-10-14T17:52:19Z

Thanks for the detailed explanation, @j-wags!

Will the library charges have to match the entire molecule, or can they match a biopolymer residue at a time?

j-wags · 2019-10-28T22:46:49Z

Whoops, missed the question above. Library charges will operate on a molecule as long as each atom in the molecule has at least one match. The final charge that gets assigned to each atom will be determined using our normal hierarchy rules. Of course, this means that the "safest" way to use LibraryCharges is to specify a whole molecule SMIRKS. But if folks want to do partial molecule SMIRKS, that will work as long as all atoms are assigned. However if those partial molecule SMIRKS overlap with each other, it can become very complex to understand the outcome.

Anyway, that means that we could in theory support biopolymer parameterization now. However, we still don't have any concept of residues, so parameter assignment might be slow (would need to SMIRKS-match a whole protein!)

One important note to emphasize is that we do not currently support using multiple charge methods for a single molecule. So, a user could not use LibraryCharges for protein residues and AM1-BCC just for capping groups and PTMs.

j-wags · 2019-10-28T22:52:00Z

openforcefield/tests/test_forcefield.py

 ''')
        pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
-        molecules = []
-        molecules.append(Molecule.from_smiles('CCO'))


In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

We can fix this! We can support tagged SMILES, such as [C:1][C:2][O:3], and ensure that we create atoms with the specified ordering. This would allow the tests to be much more compact, and could have other uses in future.

I believe cmiles has methods to do this---we should consult with @ChayaSt .

There have been requests for this functionality from several sources. I've opened an initial Issue with the spec for this: #448

j-wags · 2019-10-28T22:57:18Z

openforcefield/typing/engines/smirnoff/parameters.py

                                         tolerance=self._SCALETOL)

+
+    def assign_charge_from_molecules(self, molecule, charge_mols):


Previously, ToolkitAM1BCCHandler was responsible for assigning charges from molecules in the charge_from_molecules kwarg. However, it makes more sense to have that functionality in the ElectrostaticsHandler, since the addition of LibraryCharges means that we won't always a ToolkitAM1BCCHandler in the ForceField.

j-wags · 2019-10-28T23:10:40Z

openforcefield/data/test_forcefields/tip3p.offxml

@@ -1,26 +1,21 @@
 <?xml version="1.0" encoding='ASCII'?>


I've resuscitated the previously non-functional tip3p.offxml by updating it to the SMIRNOFF 0.3 spec and having it use LibraryCharges.

j-wags · 2019-10-28T23:13:27Z

openforcefield/tests/test_forcefield.py

+
+    @pytest.mark.parametrize("toolkit_registry,registry_description", toolkit_registries)
+    def test_pass_invalid_kwarg_to_create_openmm_system(self, toolkit_registry, registry_description):
+        """Test to ensure an exception is raised when an unrecognized kwarg is passed """
+        from simtk.openmm import app
+
+        file_path = get_data_file_path('test_forcefields/smirnoff99Frosst.offxml')
+        forcefield = ForceField(file_path)
+        pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
+        molecules = []
+        molecules.append(Molecule.from_smiles('CCO'))
+        topology = Topology.from_openmm(pdbfile.topology, unique_molecules=molecules)
+
+        with pytest.raises(ValueError, match=".* not used by any registered force Handler: {'invalid_kwarg'}.*") as e:
+            omm_system = forcefield.create_openmm_system(topology, invalid_kwarg='aaa', toolkit_registry=toolkit_registry)
+
+
+    @pytest.mark.parametrize("inputs", nonbonded_resolution_matrix)
+    def test_nonbonded_method_resolution(self,
+                                         inputs
+                                         ):
+        """Test predefined permutations of input options to ensure nonbonded handling is correctly resolved"""
+        from simtk.openmm import app
+
+        vdw_method = inputs['vdw_method']
+        electrostatics_method = inputs['electrostatics_method']
+        has_periodic_box = inputs['has_periodic_box']
+        omm_force = inputs['omm_force']
+        exception = inputs['exception']
+        exception_match= inputs['exception_match']
+
+        molecules = [create_ethanol()]
+        forcefield = ForceField('test_forcefields/smirnoff99Frosst.offxml')
+
+        pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
+        topology = Topology.from_openmm(pdbfile.topology, unique_molecules=molecules)
+
+        if not(has_periodic_box):
+            topology.box_vectors = None
+
+        if exception is None:
+            # The method is validated and may raise an exception if it's not supported.
+            forcefield.get_parameter_handler('vdW', {}).method = vdw_method
+            forcefield.get_parameter_handler('Electrostatics', {}).method = electrostatics_method
+            omm_system = forcefield.create_openmm_system(topology)
+            nonbond_method_matched = False
+            for f_idx in range(omm_system.getNumForces()):
+                force = omm_system.getForce(f_idx)
+                if isinstance(force, openmm.NonbondedForce):
+                    if force.getNonbondedMethod() == omm_force:
+                        nonbond_method_matched = True
+            assert nonbond_method_matched
+        else:
+            with pytest.raises(exception, match=exception_match) as excinfo:
+                # The method is validated and may raise an exception if it's not supported.
+                forcefield.get_parameter_handler('vdW', {}).method = vdw_method
+                forcefield.get_parameter_handler('Electrostatics', {}).method = electrostatics_method
+                omm_system = forcefield.create_openmm_system(topology)
+
+


These are not new tests, they were just moved.

j-wags · 2019-10-29T00:23:40Z

@jchodera @SimonBoothroyd @slochower @andrrizzi This PR is ready for review if any of you have time. It's somewhat large, so I've left some comments that should add context and speed up the review. A detailed summary of changes is in the summary of this PR and the additions to releasehistory.rst

Also, I'd appreciate your thoughts on the following: the previous behavior when a molecule couldn't be assigned charges (for example, if we deleted the ToolkitAM1BCCHandler) was to just let it go into the system with all zero charges. However, now that people have more freedom to mix and match charge methods, we probably want to raise an error if a molecule doesn't get charges assigned. I've gone ahead and implemented anUnassignedChargeParameterException that gets raised in this scenario and noted the behavior change in the release notes.

slochower · 2019-10-30T21:52:09Z

The-SMIRNOFF-force-field-format.md

+| 0.3                                |                                      | `smirks`, `charges` (indexed)   | `name`, `id`, `parent_id`     |
+
+
+


Are LibraryCharges going to have their own version scheme? This would probably be obvious if I were paying closer attention to the development of this feature.

Yes. Each section has its own version in the SMIRNOFF 0.3 spec.

This seems excessive, but we recognized during the last spec revamp that, for example, the Electrostatics tag might in the future need to contain detailed PME settings, or that maybe some day we'll need to indicate that the Angles section can support partial_bondorder keywords. If we only have the top-level SMIRNOFF tag recording complete file compatibility, we'd be incrementing the number constantly. So, the top-level SMIRNOFF tag only encodes things about general structure that affects (de)serialization (eg. units in headers vs units in quantity), without worrying about the specific keywords inside. This leaves it up to the specific section versions to indicate changes in keywords or behavior.

Confusingly, all section versions begin at 0.3, since that's the first SMIRNOFF spec version that stated that sections have versions. That may have been a poor choice by me, but we did it, so that's the system now.

The big plus is that it lets us talk about the old sections using a common nomenclature. For example, the attributes that were defined for the original HarmonicBondForce in the SMIRNOFF 0.1 spec could be different than those for the Bonds section in the SMIRNOFF 0.2 spec, which are in turn different from the Bonds section version 0.3. However, there wasn't clear language to express what those different things are, or how to port them to the current format. So now we have posthumously versioned those -- HarmonicBondForce is basically "Bonds section version 0.1" and the SMIRNOFF 0.2 spec Bonds tag (which differs from the current Bonds syntax) is effectively "Bonds section version 0.2". Now that we have a 1:1 mapping defined, old section "versions" are defined objects that can have clearly-defined comparisons/upconverters to current and future section versions.

slochower · 2019-10-30T21:53:03Z

The-SMIRNOFF-force-field-format.md

+
+
+
 ### `<ChargeIncrementModel>`: Small molecule and fragment charges


Oh, I think I see. Each ParameterHandler now has the SMIRNOFF specification version in its title string? Interesting.

Covered in previous comment -- All sections got versions in the SMIRNOFF 0.3 spec. Since they implicitly "had" versions in previous SMIRNOFF specs, those were assumed to be 0.1 and 0.2. By versioning each section independently, sections can undergo rapid development/improvement, while maintaining their own sensible versioning scheme. But while sections progress rapidly, we won't need to update the top-level SMIRNOFF spec version several times a year.

slochower · 2019-10-30T21:53:46Z

openforcefield/data/test_forcefields/tip3p.offxml

+  <!-- These values are taken from OpenMM's TIP3P implementation: https://github.com/openmm/openmm/blob/f9106ddb6a0d72c960c73b23ed0e07cd7190a80f/wrappers/python/simtk/openmm/app/data/tip3p.xml -->
+  <vdW version="0.3" potential="Lennard-Jones-12-6" combining_rules="Lorentz-Berthelot" scale12="0.0" scale13="0.0" scale14="0.5" scale15="1" switch_width="1.0 * angstroms" cutoff="9.0 * angstroms" method="cutoff">
+    <Atom smirks="[#1]-[#8X2H2+0:1]-[#1]" id="n1" sigma="0.31507524065751241 * nanometers" epsilon="0.635968 * kilojoules_per_mole" />
+    <Atom smirks="[#1:1]-[#8X2H2+0]-[#1]" id="n2" sigma="1 * nanometers" epsilon="0 * kilojoules_per_mole" />


Any chance these will match something other than water?

These seem pretty safe, short of intentionally malicious use. The only further safeguard might be to explicitly state [#1X1:1] to ensure that none of the hydrogens have two bonds. But... I'd be impressed if that showed up in the bug tracker.

slochower · 2019-10-30T21:56:31Z

openforcefield/typing/engines/smirnoff/parameters.py

+
+        """
+        # TODO: Change this to interface with system object instead of topology once we move away from OMM's System
+        topology._ref_mol_to_charge_method[ref_mol] = self.__class__


What is self.__class__?

The _ref_mol_to_charge_method dict records where each unique molecule in the topology gets its charge from. It's initialized to have them all be None, but those get filled in as charging occurs.

Since _NonbondedForce is inherited by ElectrostaticsHandler, ToolkitAM1BCCHandler, and LibraryChargeHandler, when self.__class__ actually runs, it will be inside of one of those classes. So the dict will be populated with pairs of the form {reference_molecule: charge-assigning _NonbondedHandler-derived class}.

slochower · 2019-10-30T21:59:46Z

openforcefield/typing/engines/smirnoff/parameters.py

+    def assign_charge_from_molecules(self, molecule, charge_mols):
+        """
+        Given an input molecule, checks against a list of molecules for an isomorphic match. If found, assigns
+        partial charges from the match to the input molecule.


Is the goal to apply LibraryCharges only between two molecules that are exactly the same?

This looks good, but wonder if in the future, it'd be nice to do LibraryCharges on fragments.

This function is for handling the charge_from_molecules kwarg. It is not related to LibraryCharges (except that it gets higher priority in the charge method hierarchy). This code existed before the PR, it's just been moved into ElectrostaticsHandler because its priority was messed up when it was attached to ToolkitAM1BCCHandler

slochower

Looks good. I asked a few non-essential questions.

jchodera

This is looking great, @j-wags! I've mostly just reviewed the documentation and tests, but please take a look at comments above.

jchodera · 2019-10-31T23:01:47Z

docs/releasehistory.rst

+  at least one LibraryCharge. If an atom is covered by multiple LibraryCharges, then the last
+  one read will be applied. At this time, there is no concept of "residues" during parametrization,
+  so it is not possible to parametrize some atoms in a molecule using LibraryCharges and
+  others using another method. Further, no effort is made to ensure that the net charge


Further, no effort is made to ensure that the net charge applied using LibraryCharges is the total formal charge on the molecule.

Since we have all the formal charge information for any provided molecules (from the toolkit layers), wouldn't it be trivial to add this check?

I'd thought about it, but concluded that it would add unnecessary complexity. At some point, maybe there will be a consensus that folks want a charge normalization scheme implemented in the LibraryCharges tag, and we can incorporate a "don't-normalize-but-just-raise-exception-if-charges-don't-add-up" option. But for now, adding this functionality would make people immediately ask for a kwarg to disable it, and then in the future we'd have to break the API or have a growing tree of "if" statements to handle that kwarg's interaction with various normalization schemes.

This seems highly dangerous to me since a mistyped library charge definition could lead to corrupted, non-integral charges. Wouldn't we want to be conservative first and then allow a way to relax things later if we find we need to?

This seems highly dangerous to me since a mistyped library charge definition could lead to corrupted, non-integral charges.

I see where you're coming from -- Our API should just "do the right thing", and that is the attitude I take toward users who show up with molecules and just "want to use the most recent FF". But here we're considering a different group of users -- those writing their own SMIRKS and LibraryCharge entries. I think it's likely that those users would find that our "helpful" guiderails actually get in their way.

There are several use cases where raising an error would get in the way. For example, I suspect that someone will start experimenting with fragment-based charge assignment, with the intent to perform their own normalization on the charges in the resulting molecule. They'll be frustrated when we disallow this.

I agree that it's possible for promiscuous SMIRKS to quietly ruin a system, so we may have to ensure that the FFs published by Open Force Field don't contain potentially-promiscuous library charges. But, we already let people do tremendously dangerous things like introduce wildcard parameters into their Bonds section, while also concatenating sections from subsequently-loaded FFs (so, if a wildcard bond-containing file is read after Parsley, all bonds will match the wildcard). The spec allows people to create the FF equivalent of nuclear fuel -- It's great if someone else is handling it, but people need to understand that messing with parameter sources is dangerous.

What I'm getting at is that this isn't a technical problem -- It would be easy to add the "formal charge" check and a resulting warning or exception+override. But users already complain that we have too many warnings. And an exception+override adds complexity on our end. If we go that route, I bet that allow_nonintegral_charges=True will just become another allow_undefined_stereo=True, which already seems to be cargo-culting around in scripts.

+1 for enforcing that the charges add up to the formal charge with the option to disable the check. I think most people (by which I mean me) would prefer to be told explicitly when the charges don't add up, thus highlighting something which is usually very wrong in the system without having to write checks for this each time a molecule is parametrized.

I talked to Jeff about this in person and actually was inclined to agree with his viewpoint for all the reasons he listed, and also worried that additional developer time he'd spend trying to keep people from making mistakes would be time taken away from scientifically higher priority tasks like BCCs and WBOs. But I don't feel that strongly about it.

Since these library charges are going to be written by hand, we would have to implement a mechanism outside of ForceField to check for this if we don't include it within ForceField. So either way, he's going to need to write this code. I'd prefer we use it to avoid being hoist by our own petard by baking it in here...

I appreciate the thoughtful replies on this topic. This may just be an area where my perspective as a developer is different that the perspective of the users. So, I'm glad for the feedback before this behavior gets baked into the toolkit!

I'll go ahead and implement a check for nonintegral charge assignment, which can optionally be disabled by a keyword argument (allow_nonintegral_charges). To future-proof this a bit, I'll have it apply to all charge methods (so, it will run in ElectrostaticsHandler.postprocess_system).

That sounds like a reasonable approach!

Ok, I've added a check to ensure that the partial charges on all particles in a Molecule sum up to its formal charge. This can be overridden with the kwarg allow_nonintegral_charge=True.

jchodera · 2019-10-31T23:02:50Z

docs/releasehistory.rst

+  :py:class:`LibraryChargeHandler <openforcefield.typing.engines.smirnoff.parameters.LibraryChargeHandler>`.
+  For a molecule to have charges assigned using LibraryCharges, all of its atoms must be covered by
+  at least one LibraryCharge. If an atom is covered by multiple LibraryCharges, then the last
+  one read will be applied. At this time, there is no concept of "residues" during parametrization,


It's unclear if this means we can parameterize a biomolecule made up of a few "residues" or not. Can you clarify whether or not it is possible to provide a library of residue charge definitions and have the whole protein charged this way, provided all atoms have one LibraryCharge applied?

Good question. I've added text to address this directly:

This functionality is thus able to apply per-residue charges similar to those in traditional
protein force fields. At this time, there is no concept of "residues" or "fragments" during
parametrization, so it is not possible to assign charges to some atoms in a molecule using
LibraryCharges, but calculate charges for other atoms in the same molecule using a different
method. To assign charges to a protein, LibraryCharges SMARTS must be provided for
the standard residues, as well as for any capping groups and post-translational modifications
that are present.

jchodera · 2019-10-31T23:03:15Z

The-SMIRNOFF-force-field-format.md

 </LibraryCharges>
 ```

+


Are the <LibraryCharges> examples above still accurate?

Wow. They actually didn't parse. Alanine was missing a / at the end of the LibraryCharge line, and TIP3P failed because +0.417 doesn't parse with the +, so I've change it to just be 0.417.

Fixed and added a test, with the TODO:

# TODO: This test is practically useless while the XML strings are hard-coded at the top of this file. # We should implement something like doctests for the XML snippets on the SMIRNOFF spec page.

jchodera · 2019-10-31T23:04:41Z

docs/releasehistory.rst

+Behavior changed
+""""""""""""""""
+- `PR #433 <https://github.com/openforcefield/openforcefield/pull/433>`_: If a molecule
+  can not be assigned charges by any charge-generation method, an


Can you clarify whether passing in charged molecules via the API can suffice here, or if it will throw this exception if the SMIRNOFF file does not contain enough information to charge all molecules in the system (even if you pass a charged molecule in via the API).

Yes. I've added text to clarify that the charge_from_molecules kwarg counts as a recognized charge assignment method.

jchodera · 2019-10-31T23:08:34Z

openforcefield/tests/test_forcefield.py

 ''')
        pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
-        molecules = []
-        molecules.append(Molecule.from_smiles('CCO'))


In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

We can fix this! We can support tagged SMILES, such as [C:1][C:2][O:3], and ensure that we create atoms with the specified ordering. This would allow the tests to be much more compact, and could have other uses in future.

I believe cmiles has methods to do this---we should consult with @ChayaSt .

jchodera · 2019-10-31T23:09:09Z

openforcefield/tests/test_forcefield.py

+
+
+    @pytest.mark.parametrize("toolkit_registry,registry_description", toolkit_registries)
+    def test_pass_invalid_kwarg_to_create_openmm_system(self, toolkit_registry, registry_description):


Do we still accept kwargs to create_openmm_system()? I thought we had deprecated that since it was so terribly dangerous!

I agree these are dangerous, but I'm struggling to think of an alternative.

Maybe the most direct way to remove charge_from_molecules is to create LibraryChargeHandler.add_molecule(), which takes Molecule objects as input and turns them into LibraryCharges. The big missing piece here would just be the reverse of what we discussed elsewhere in this PR -- A way to go from OFFMol --> tagged SMILES. We could, by default, add the new whole-molecule SMARTS to the bottom of the hierarchy.

Other kwargs, like allow_nonintegral_charges and toolkit_registry, will require further thought. But we could start phasing them out.

SimonBoothroyd

I think this mostly looks great! Just one comment about future extensions and another about my preferred default behavior.

SimonBoothroyd · 2019-11-01T20:47:17Z

openforcefield/typing/engines/smirnoff/parameters.py

+        if not hasattr(topology, "_ref_mol_to_charge_method"):
+            topology._ref_mol_to_charge_method = {ref_mol: None for ref_mol in topology.reference_molecules}


I wonder if rather than tagging the topology directly with this attribute, it would be cleaner to carry around some additional, extensible, internal state object which gets passed between successive handlers and that can store intermediate data such as this?

This is admittedly a cheap hack. The object which this information should be stored on is the System (because then after parameterization, users will be able to look right at a particle and see what ParameterHandler the charge came from).

I don't know how I feel about introducing an internal state object, so I'm thinking (typing) out loud here... It seems like this would partially be a "parametrization event" object, effectively like a rich log of what happened. This might be good for tracking provenance -- and actually we just talked about this at the MolSSI workshop (It would be neat if system prep tools produced an output which capture the important details of what they've done, which could be submitted as SI, relevant to @karmencj ). So I can envision it as purely an object to be written to. However, becomes more complex if it passes information back to the FF, topology, or system during the parametrization process (which seems to be the proposal here). Unless there's some clear model of when it talks to the other objects, we could end up with some very hard-to-test behaviors. I'll keep thinking about how this could work, though.

object which this information should be stored on is the System

This may be the case - although I'm not sure what the status of this object is?

This might be good for tracking provenance

In principle it would - it's never a bad thing to keep track of what went into and came out of the different modular handlers.

However, becomes more complex if it passes information back to the FF, topology, or system during the parametrization process [...] Unless there's some clear model of when it talks to the other objects

I don't think I understand this point - where would this complexity arise? The object wouldn't need to talk to anything, it would purely be an object which gets passed between handlers which would be a readable source of previously assigned metadata which could either be utilized if needed, or ignored if not, and then extended ready to be passed along the chain of handlers. In principle the object could be as simple as a dict of dicts, where each key is the handler name and the value the dict of provenance.

we could end up with some very hard-to-test behaviors

I think that is inherent in having modular components which can be almost arbitrarily pieced together, but provided the required input and provided output provenance of each handler is well defined, then it shouldn't be overly bad!

Honestly I don't have particularly strong opinions about this, but given that in the future there is probably going to be cases where handlers may need access to information about what has already been done to the system (as is the case here), it might be worth thinking about how this will worker earlier rather than later!

I'd be very careful about decorating the source Topology object with something that should change when the same Topology object is fed to a different ForceField.

I do think we should seriously consider a new openforcefield System object as the right way to contain all of the applied parameterization data accessible via a public API.

…arges

j-wags added 2 commits October 8, 2019 13:55

Initial untested implementation of LibraryChargeHandler

b3c492f

somewhat functional librarychargehandler

9ca3018

j-wags added 2 commits October 10, 2019 21:36

Initial fix for #434

1b8efab

Change LibraryChargeHandler to NOT use valencedict for chem. env. match

ebd58ea

Add library-charge-by-single-atom test

d69db99

j-wags added 17 commits October 18, 2019 14:47

Expand librarycharge tests

5c5af0f

Add check for unresolvable ParameterHandler dependency graph + test same

14355b4

debugging handler dependency resolution order

4f7e4ff

Debugged library charge/dependency resolution tests

b7e0058

Merge branch 'master' into library_charges

5218a42

Fix LibraryChargeHandler creation test

8fe8ebe

Make circular depend test use fixture for more reliable setup/teardown

2cafabb

Replace usages of Molecule.from_smiles in tests where order is important

3f55ed9

del missing-reference test in test_ffs - already tested in test_topology

0bb6772

Move common ES/VDW handler functionality to _NonbondedHandler base class

8306542

Update docs to build LibraryChargeX API pages

d810fd6

Move tip3p LCs from test_ffs.py to tip3p.offxml, update to 0.3 spec

65e009c

Update SMIRNOFF spec page for LC release

3148ba7

Update releasehistory for LC functionality.

66bc5ac

Ensure that GBSAHandler runs after all charge assignment completes

29f9cd0

Remove unused code

7e006b8

Track mol charge assignment method on topology during system creation

f52efe9

j-wags added 2 commits October 28, 2019 15:23

refactor charge method hierarchy test to be more thorough

3b2527a

add library charge hierarchy test

818a1fa

j-wags commented Oct 28, 2019

View reviewed changes

j-wags added 2 commits October 28, 2019 16:01

Remove redundant charge assignment code from toolkitam1bcchandler

82746c3

Update outdated comment

59f6199

j-wags changed the title ~~[WIP] Library Charge Support~~ Library Charge Support Oct 28, 2019

j-wags commented Oct 28, 2019

View reviewed changes

j-wags added 3 commits October 28, 2019 16:14

fix typo

904d08e

Raise exception if a molecule can not be assigned charges, test behavior

8e15a2f

Update releasenotes with behavior change wrt. UnassignedChargeParamExcpn

8f416c0

slochower reviewed Oct 30, 2019

View reviewed changes

slochower approved these changes Oct 30, 2019

View reviewed changes

Merge branch 'master' into library_charges

22531cc

jchodera approved these changes Oct 31, 2019

View reviewed changes

SimonBoothroyd reviewed Nov 1, 2019

View reviewed changes

j-wags added 7 commits November 6, 2019 12:58

Merge branch 'master' into library_charges

99eac9b

Add nonintegral charge check, exception, and test

446d370

clarify LC biopolymer assignment, add note about nonintegral charge err

5738d9c

Add tests for spec doc librarycharge examples, fix them.

db192ba

Merge remote-tracking branch 'origin/library_charges' into library_ch…

a87fadb

…arges

Relax nonintegral charge exception to delta of 0.01

fe579b4

allow nonintegral charge in check_parameter_coverage example

99332f9

j-wags merged commit f8ea460 into master Nov 7, 2019

j-wags mentioned this pull request Nov 13, 2019

Round partial charges to enforce correct integral charge on molecule #456

Closed

mattwthompson deleted the library_charges branch September 7, 2022 14:44

		tolerance=self._SCALETOL)


		def assign_charge_from_molecules(self, molecule, charge_mols):

		\| 0.3 \| \| `smirks`, `charges` (indexed) \| `name`, `id`, `parent_id` \|




		### `<ChargeIncrementModel>`: Small molecule and fragment charges



		@pytest.mark.parametrize("toolkit_registry,registry_description", toolkit_registries)
		def test_pass_invalid_kwarg_to_create_openmm_system(self, toolkit_registry, registry_description):

		if not hasattr(topology, "_ref_mol_to_charge_method"):
		topology._ref_mol_to_charge_method = {ref_mol: None for ref_mol in topology.reference_molecules}

Library Charge Support #433

Library Charge Support #433

Uh oh!

Conversation

j-wags commented Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgtm-com bot commented Oct 10, 2019

Uh oh!

codecov-io commented Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

proteneer commented Oct 11, 2019

Uh oh!

j-wags commented Oct 11, 2019

Uh oh!

proteneer commented Oct 11, 2019

Uh oh!

jchodera commented Oct 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j-wags commented Oct 11, 2019

Uh oh!

jchodera commented Oct 14, 2019

Uh oh!

j-wags commented Oct 28, 2019

Uh oh!

j-wags Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-wags Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-wags commented Oct 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-wags Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slochower left a comment

Choose a reason for hiding this comment

Uh oh!

jchodera left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-wags commented Oct 8, 2019 •

edited

Loading

codecov-io commented Oct 10, 2019 •

edited

Loading

jchodera commented Oct 11, 2019 •

edited

Loading

j-wags Oct 28, 2019 •

edited

Loading

j-wags Oct 28, 2019 •

edited

Loading

j-wags Oct 31, 2019 •

edited

Loading

j-wags Nov 1, 2019 •

edited

Loading

j-wags Nov 2, 2019 •

edited

Loading