Skip to content

Conversation

@j-wags
Copy link
Member

@j-wags j-wags commented Oct 8, 2019

  • Close Implement Library Charges #25
  • Fix Topology.add_molecule has undefined behavior if given an existing molecule with different atom indexing #434
  • Add common base class (_NonbondedHandler) for ParameterHandlers that deal with OpenMM's NonbondedForce objects
  • Make _NonbondedHandlers know whether a previous handler has assigned charges to a molecule (and therefore to skip it)
    • I ended up having _NonbondedHandlers attach this information to the input Topology during system creation (by creating and modifying a new dict topology._ref_mol_to_charge_method). This is performed using _NonbondedHandler.mark_charges_assigned(self, ref_mol, topology) and NonbondedHandler.check_charges_assigned(ref_mol, topology). This is not an ideal solution, since the Topology shouldn't need to know anything about the parameters that are assigned. But until we have our own System class, there is no way to store this information in OpenMM's System class, and the Topology is the only other data structure we have continuous access to during system creation.
  • Add tests
    • Especially relevant was the addition of test_charge_method_hierarchy in test_forcefield -- Basically, a molecule should get charges from the charge_from_molecules kwarg, then LibraryCharges, then AM1-BCC. The new hierarchy test is fairly exhaustive in ensuring that this is the actual behavior.
  • Update docstrings/documentation, if applicable
  • Update changelog

Status

  • Ready for review
  • Ready for merge

@lgtm-com
Copy link

lgtm-com bot commented Oct 10, 2019

This pull request introduces 2 alerts when merging 9ca3018 into d2e7173 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unreachable code

@codecov-io
Copy link

codecov-io commented Oct 10, 2019

Codecov Report

Merging #433 into master will increase coverage by 0.73%.
The diff coverage is 93.64%.

@proteneer
Copy link
Collaborator

Thanks for doing this!

  1. Might make sense to add a test of Library Charges in conjunction with multiple overlapping BCCs. For us down the road, training to BCCs probably makes more sense than the charges directly.

  2. Will there actually be a set of parameters released? (eg. for proteins)

@j-wags
Copy link
Member Author

j-wags commented Oct 11, 2019

@proteneer

Might make sense to add a test of Library Charges in conjunction with multiple overlapping BCCs. For us down the road, training to BCCs probably makes more sense than the charges directly.

That's a good point. I'd planned on doing library charges and BCCs separately, but I've had to refresh my knowledge of our charging systems during this work, so it may make sense to do them simultaneously. My goal is to get the library charge functionality+tests working and cleaned up by this afternoon. If I can do that, I'll see if I could get BCCs in early next week.

Will there actually be a set of parameters released? (eg. for proteins)

No. During the in-person meeting, we discussed which order we should add new functionality, and time estimates for each. Missing toolkit functionality is blocking for several OFF subprojects. I pointed out the massive GBSA time overrun was largely due to my writing and validating the GBSA parameter files, not the GBSAHandler itself. So, in the interest of time, LibraryCharges and BCCs will come out without "real" parameters (except for dummy files used for tests).

I know Caitlin had been working on porting an AMBER protein FF's charges to SMIRNOFF format, but the LibraryChargeHandler wasn't ready before she left, so that never got finished.

@proteneer
Copy link
Collaborator

Thanks - this sounds good.

@jchodera
Copy link
Member

jchodera commented Oct 11, 2019

@j-wags : Is there an issue we can link to that describes how this functionality will work, and the order in which charge assignment schemes will operate? Or has the spec been updated to describe this in detail?

@j-wags
Copy link
Member Author

j-wags commented Oct 11, 2019

@jchodera

Is there an issue we can link to that describes how this functionality will work, and the order in which charge assignment schemes will operate? Or has the spec been updated to describe this in detail?

I can add the current behavior to the spec if it's not there already. Basically, the priority will be charge_from_molecules > LibraryCharges > ToolkitAM1BCC. BCCs will be applied to the resulting charges afterwards, if present.

The way that the object model works now creates some problems, like if a library charge parameter DOES match a molecule, but sets the charges for a molecule to be all zeroes, then ToolkitAM1BCC won't know that it matched at all, and will run AM1-BCC. This is because the ParameterHandlers are completely modular, and have to infer the history of the charging process solely by the information in the OpenMM System when their turn comes up.

So, we'll need to do a SMIRNOFF spec change in the future, where the charge-generating handlers lose some modularity. This will likely involve making the charge-generating handlers be defined in a subsection within the Electrostatics tag, however this needs to be reconciled against the fact that VirtualSites will have both charge and LJ parameters. Decisions on that front also tie into the architecture of our System class. But this is a restatement of concerns in #310 and #340.

Anyway, we'll be debating the spec change for several months in the best case, so since there are people funded to do this science now, it makes sense to put out hard-coded functionality in the short term.

@jchodera
Copy link
Member

Thanks for the detailed explanation, @j-wags!

Will the library charges have to match the entire molecule, or can they match a biopolymer residue at a time?

@j-wags
Copy link
Member Author

j-wags commented Oct 28, 2019

Whoops, missed the question above. Library charges will operate on a molecule as long as each atom in the molecule has at least one match. The final charge that gets assigned to each atom will be determined using our normal hierarchy rules. Of course, this means that the "safest" way to use LibraryCharges is to specify a whole molecule SMIRKS. But if folks want to do partial molecule SMIRKS, that will work as long as all atoms are assigned. However if those partial molecule SMIRKS overlap with each other, it can become very complex to understand the outcome.

Anyway, that means that we could in theory support biopolymer parameterization now. However, we still don't have any concept of residues, so parameter assignment might be slow (would need to SMIRKS-match a whole protein!)

One important note to emphasize is that we do not currently support using multiple charge methods for a single molecule. So, a user could not use LibraryCharges for protein residues and AM1-BCC just for capping groups and PTMs.

''')
pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
molecules = []
molecules.append(Molecule.from_smiles('CCO'))
Copy link
Member Author

@j-wags j-wags Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

We can fix this! We can support tagged SMILES, such as [C:1][C:2][O:3], and ensure that we create atoms with the specified ordering. This would allow the tests to be much more compact, and could have other uses in future.

I believe cmiles has methods to do this---we should consult with @ChayaSt .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been requests for this functionality from several sources. I've opened an initial Issue with the spec for this: #448

tolerance=self._SCALETOL)


def assign_charge_from_molecules(self, molecule, charge_mols):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, ToolkitAM1BCCHandler was responsible for assigning charges from molecules in the charge_from_molecules kwarg. However, it makes more sense to have that functionality in the ElectrostaticsHandler, since the addition of LibraryCharges means that we won't always a ToolkitAM1BCCHandler in the ForceField.

@j-wags j-wags changed the title [WIP] Library Charge Support Library Charge Support Oct 28, 2019
@@ -1,26 +1,21 @@
<?xml version="1.0" encoding='ASCII'?>
Copy link
Member Author

@j-wags j-wags Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've resuscitated the previously non-functional tip3p.offxml by updating it to the SMIRNOFF 0.3 spec and having it use LibraryCharges.

Comment on lines +884 to +943

@pytest.mark.parametrize("toolkit_registry,registry_description", toolkit_registries)
def test_pass_invalid_kwarg_to_create_openmm_system(self, toolkit_registry, registry_description):
"""Test to ensure an exception is raised when an unrecognized kwarg is passed """
from simtk.openmm import app

file_path = get_data_file_path('test_forcefields/smirnoff99Frosst.offxml')
forcefield = ForceField(file_path)
pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
molecules = []
molecules.append(Molecule.from_smiles('CCO'))
topology = Topology.from_openmm(pdbfile.topology, unique_molecules=molecules)

with pytest.raises(ValueError, match=".* not used by any registered force Handler: {'invalid_kwarg'}.*") as e:
omm_system = forcefield.create_openmm_system(topology, invalid_kwarg='aaa', toolkit_registry=toolkit_registry)


@pytest.mark.parametrize("inputs", nonbonded_resolution_matrix)
def test_nonbonded_method_resolution(self,
inputs
):
"""Test predefined permutations of input options to ensure nonbonded handling is correctly resolved"""
from simtk.openmm import app

vdw_method = inputs['vdw_method']
electrostatics_method = inputs['electrostatics_method']
has_periodic_box = inputs['has_periodic_box']
omm_force = inputs['omm_force']
exception = inputs['exception']
exception_match= inputs['exception_match']

molecules = [create_ethanol()]
forcefield = ForceField('test_forcefields/smirnoff99Frosst.offxml')

pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
topology = Topology.from_openmm(pdbfile.topology, unique_molecules=molecules)

if not(has_periodic_box):
topology.box_vectors = None

if exception is None:
# The method is validated and may raise an exception if it's not supported.
forcefield.get_parameter_handler('vdW', {}).method = vdw_method
forcefield.get_parameter_handler('Electrostatics', {}).method = electrostatics_method
omm_system = forcefield.create_openmm_system(topology)
nonbond_method_matched = False
for f_idx in range(omm_system.getNumForces()):
force = omm_system.getForce(f_idx)
if isinstance(force, openmm.NonbondedForce):
if force.getNonbondedMethod() == omm_force:
nonbond_method_matched = True
assert nonbond_method_matched
else:
with pytest.raises(exception, match=exception_match) as excinfo:
# The method is validated and may raise an exception if it's not supported.
forcefield.get_parameter_handler('vdW', {}).method = vdw_method
forcefield.get_parameter_handler('Electrostatics', {}).method = electrostatics_method
omm_system = forcefield.create_openmm_system(topology)


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not new tests, they were just moved.

@j-wags
Copy link
Member Author

j-wags commented Oct 29, 2019

@jchodera @SimonBoothroyd @slochower @andrrizzi This PR is ready for review if any of you have time. It's somewhat large, so I've left some comments that should add context and speed up the review. A detailed summary of changes is in the summary of this PR and the additions to releasehistory.rst

Also, I'd appreciate your thoughts on the following: the previous behavior when a molecule couldn't be assigned charges (for example, if we deleted the ToolkitAM1BCCHandler) was to just let it go into the system with all zero charges. However, now that people have more freedom to mix and match charge methods, we probably want to raise an error if a molecule doesn't get charges assigned. I've gone ahead and implemented anUnassignedChargeParameterException that gets raised in this scenario and noted the behavior change in the release notes.

| 0.3 | | `smirks`, `charges` (indexed) | `name`, `id`, `parent_id` |



Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are LibraryCharges going to have their own version scheme? This would probably be obvious if I were paying closer attention to the development of this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Each section has its own version in the SMIRNOFF 0.3 spec.

This seems excessive, but we recognized during the last spec revamp that, for example, the Electrostatics tag might in the future need to contain detailed PME settings, or that maybe some day we'll need to indicate that the Angles section can support partial_bondorder keywords. If we only have the top-level SMIRNOFF tag recording complete file compatibility, we'd be incrementing the number constantly. So, the top-level SMIRNOFF tag only encodes things about general structure that affects (de)serialization (eg. units in headers vs units in quantity), without worrying about the specific keywords inside. This leaves it up to the specific section versions to indicate changes in keywords or behavior.

Copy link
Member Author

@j-wags j-wags Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusingly, all section versions begin at 0.3, since that's the first SMIRNOFF spec version that stated that sections have versions. That may have been a poor choice by me, but we did it, so that's the system now.

The big plus is that it lets us talk about the old sections using a common nomenclature. For example, the attributes that were defined for the original HarmonicBondForce in the SMIRNOFF 0.1 spec could be different than those for the Bonds section in the SMIRNOFF 0.2 spec, which are in turn different from the Bonds section version 0.3. However, there wasn't clear language to express what those different things are, or how to port them to the current format. So now we have posthumously versioned those -- HarmonicBondForce is basically "Bonds section version 0.1" and the SMIRNOFF 0.2 spec Bonds tag (which differs from the current Bonds syntax) is effectively "Bonds section version 0.2". Now that we have a 1:1 mapping defined, old section "versions" are defined objects that can have clearly-defined comparisons/upconverters to current and future section versions.




### `<ChargeIncrementModel>`: Small molecule and fragment charges

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I see. Each ParameterHandler now has the SMIRNOFF specification version in its title string? Interesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in previous comment -- All sections got versions in the SMIRNOFF 0.3 spec. Since they implicitly "had" versions in previous SMIRNOFF specs, those were assumed to be 0.1 and 0.2. By versioning each section independently, sections can undergo rapid development/improvement, while maintaining their own sensible versioning scheme. But while sections progress rapidly, we won't need to update the top-level SMIRNOFF spec version several times a year.

<!-- These values are taken from OpenMM's TIP3P implementation: https://github.com/openmm/openmm/blob/f9106ddb6a0d72c960c73b23ed0e07cd7190a80f/wrappers/python/simtk/openmm/app/data/tip3p.xml -->
<vdW version="0.3" potential="Lennard-Jones-12-6" combining_rules="Lorentz-Berthelot" scale12="0.0" scale13="0.0" scale14="0.5" scale15="1" switch_width="1.0 * angstroms" cutoff="9.0 * angstroms" method="cutoff">
<Atom smirks="[#1]-[#8X2H2+0:1]-[#1]" id="n1" sigma="0.31507524065751241 * nanometers" epsilon="0.635968 * kilojoules_per_mole" />
<Atom smirks="[#1:1]-[#8X2H2+0]-[#1]" id="n2" sigma="1 * nanometers" epsilon="0 * kilojoules_per_mole" />

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance these will match something other than water?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem pretty safe, short of intentionally malicious use. The only further safeguard might be to explicitly state [#1X1:1] to ensure that none of the hydrogens have two bonds. But... I'd be impressed if that showed up in the bug tracker.

"""
# TODO: Change this to interface with system object instead of topology once we move away from OMM's System
topology._ref_mol_to_charge_method[ref_mol] = self.__class__

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is self.__class__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ref_mol_to_charge_method dict records where each unique molecule in the topology gets its charge from. It's initialized to have them all be None, but those get filled in as charging occurs.

Since _NonbondedForce is inherited by ElectrostaticsHandler, ToolkitAM1BCCHandler, and LibraryChargeHandler, when self.__class__ actually runs, it will be inside of one of those classes. So the dict will be populated with pairs of the form {reference_molecule: charge-assigning _NonbondedHandler-derived class}.

def assign_charge_from_molecules(self, molecule, charge_mols):
"""
Given an input molecule, checks against a list of molecules for an isomorphic match. If found, assigns
partial charges from the match to the input molecule.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to apply LibraryCharges only between two molecules that are exactly the same?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but wonder if in the future, it'd be nice to do LibraryCharges on fragments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is for handling the charge_from_molecules kwarg. It is not related to LibraryCharges (except that it gets higher priority in the charge method hierarchy). This code existed before the PR, it's just been moved into ElectrostaticsHandler because its priority was messed up when it was attached to ToolkitAM1BCCHandler

Copy link

@slochower slochower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I asked a few non-essential questions.

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @j-wags! I've mostly just reviewed the documentation and tests, but please take a look at comments above.

at least one LibraryCharge. If an atom is covered by multiple LibraryCharges, then the last
one read will be applied. At this time, there is no concept of "residues" during parametrization,
so it is not possible to parametrize some atoms in a molecule using LibraryCharges and
others using another method. Further, no effort is made to ensure that the net charge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, no effort is made to ensure that the net charge applied using LibraryCharges is the total formal charge on the molecule.

Since we have all the formal charge information for any provided molecules (from the toolkit layers), wouldn't it be trivial to add this check?

Copy link
Member Author

@j-wags j-wags Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd thought about it, but concluded that it would add unnecessary complexity. At some point, maybe there will be a consensus that folks want a charge normalization scheme implemented in the LibraryCharges tag, and we can incorporate a "don't-normalize-but-just-raise-exception-if-charges-don't-add-up" option. But for now, adding this functionality would make people immediately ask for a kwarg to disable it, and then in the future we'd have to break the API or have a growing tree of "if" statements to handle that kwarg's interaction with various normalization schemes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems highly dangerous to me since a mistyped library charge definition could lead to corrupted, non-integral charges. Wouldn't we want to be conservative first and then allow a way to relax things later if we find we need to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems highly dangerous to me since a mistyped library charge definition could lead to corrupted, non-integral charges.

I see where you're coming from -- Our API should just "do the right thing", and that is the attitude I take toward users who show up with molecules and just "want to use the most recent FF". But here we're considering a different group of users -- those writing their own SMIRKS and LibraryCharge entries. I think it's likely that those users would find that our "helpful" guiderails actually get in their way.

There are several use cases where raising an error would get in the way. For example, I suspect that someone will start experimenting with fragment-based charge assignment, with the intent to perform their own normalization on the charges in the resulting molecule. They'll be frustrated when we disallow this.

I agree that it's possible for promiscuous SMIRKS to quietly ruin a system, so we may have to ensure that the FFs published by Open Force Field don't contain potentially-promiscuous library charges. But, we already let people do tremendously dangerous things like introduce wildcard parameters into their Bonds section, while also concatenating sections from subsequently-loaded FFs (so, if a wildcard bond-containing file is read after Parsley, all bonds will match the wildcard). The spec allows people to create the FF equivalent of nuclear fuel -- It's great if someone else is handling it, but people need to understand that messing with parameter sources is dangerous.

What I'm getting at is that this isn't a technical problem -- It would be easy to add the "formal charge" check and a resulting warning or exception+override. But users already complain that we have too many warnings. And an exception+override adds complexity on our end. If we go that route, I bet that allow_nonintegral_charges=True will just become another allow_undefined_stereo=True, which already seems to be cargo-culting around in scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for enforcing that the charges add up to the formal charge with the option to disable the check. I think most people (by which I mean me) would prefer to be told explicitly when the charges don't add up, thus highlighting something which is usually very wrong in the system without having to write checks for this each time a molecule is parametrized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to Jeff about this in person and actually was inclined to agree with his viewpoint for all the reasons he listed, and also worried that additional developer time he'd spend trying to keep people from making mistakes would be time taken away from scientifically higher priority tasks like BCCs and WBOs. But I don't feel that strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these library charges are going to be written by hand, we would have to implement a mechanism outside of ForceField to check for this if we don't include it within ForceField. So either way, he's going to need to write this code. I'd prefer we use it to avoid being hoist by our own petard by baking it in here...

Copy link
Member Author

@j-wags j-wags Nov 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the thoughtful replies on this topic. This may just be an area where my perspective as a developer is different that the perspective of the users. So, I'm glad for the feedback before this behavior gets baked into the toolkit!

I'll go ahead and implement a check for nonintegral charge assignment, which can optionally be disabled by a keyword argument (allow_nonintegral_charges). To future-proof this a bit, I'll have it apply to all charge methods (so, it will run in ElectrostaticsHandler.postprocess_system).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a reasonable approach!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added a check to ensure that the partial charges on all particles in a Molecule sum up to its formal charge. This can be overridden with the kwarg allow_nonintegral_charge=True.

:py:class:`LibraryChargeHandler <openforcefield.typing.engines.smirnoff.parameters.LibraryChargeHandler>`.
For a molecule to have charges assigned using LibraryCharges, all of its atoms must be covered by
at least one LibraryCharge. If an atom is covered by multiple LibraryCharges, then the last
one read will be applied. At this time, there is no concept of "residues" during parametrization,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear if this means we can parameterize a biomolecule made up of a few "residues" or not. Can you clarify whether or not it is possible to provide a library of residue charge definitions and have the whole protein charged this way, provided all atoms have one LibraryCharge applied?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I've added text to address this directly:

This functionality is thus able to apply per-residue charges similar to those in traditional
protein force fields. At this time, there is no concept of "residues" or "fragments" during
parametrization, so it is not possible to assign charges to some atoms in a molecule using
LibraryCharges, but calculate charges for other atoms in the same molecule using a different
method. To assign charges to a protein, LibraryCharges SMARTS must be provided for
the standard residues, as well as for any capping groups and post-translational modifications
that are present.

</LibraryCharges>
```


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the <LibraryCharges> examples above still accurate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. They actually didn't parse. Alanine was missing a / at the end of the LibraryCharge line, and TIP3P failed because +0.417 doesn't parse with the +, so I've change it to just be 0.417.

Fixed and added a test, with the TODO:

        # TODO: This test is practically useless while the XML strings are hard-coded at the top of this file.
        #       We should implement something like doctests for the XML snippets on the SMIRNOFF spec page.

Behavior changed
""""""""""""""""
- `PR #433 <https://github.com/openforcefield/openforcefield/pull/433>`_: If a molecule
can not be assigned charges by any charge-generation method, an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify whether passing in charged molecules via the API can suffice here, or if it will throw this exception if the SMIRNOFF file does not contain enough information to charge all molecules in the system (even if you pass a charged molecule in via the API).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've added text to clarify that the charge_from_molecules kwarg counts as a recognized charge assignment method.

''')
pdbfile = app.PDBFile(get_data_file_path('systems/test_systems/1_ethanol.pdb'))
molecules = []
molecules.append(Molecule.from_smiles('CCO'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests where atom indexing is important, I've tried to remove uses of Molecule.from_smiles, since that does not guarantee a specific ordering of the atoms.

We can fix this! We can support tagged SMILES, such as [C:1][C:2][O:3], and ensure that we create atoms with the specified ordering. This would allow the tests to be much more compact, and could have other uses in future.

I believe cmiles has methods to do this---we should consult with @ChayaSt .



@pytest.mark.parametrize("toolkit_registry,registry_description", toolkit_registries)
def test_pass_invalid_kwarg_to_create_openmm_system(self, toolkit_registry, registry_description):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still accept kwargs to create_openmm_system()? I thought we had deprecated that since it was so terribly dangerous!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these are dangerous, but I'm struggling to think of an alternative.

Maybe the most direct way to remove charge_from_molecules is to create LibraryChargeHandler.add_molecule(), which takes Molecule objects as input and turns them into LibraryCharges. The big missing piece here would just be the reverse of what we discussed elsewhere in this PR -- A way to go from OFFMol --> tagged SMILES. We could, by default, add the new whole-molecule SMARTS to the bottom of the hierarchy.

Other kwargs, like allow_nonintegral_charges and toolkit_registry, will require further thought. But we could start phasing them out.

Copy link
Contributor

@SimonBoothroyd SimonBoothroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mostly looks great! Just one comment about future extensions and another about my preferred default behavior.

Comment on lines +2236 to +2237
if not hasattr(topology, "_ref_mol_to_charge_method"):
topology._ref_mol_to_charge_method = {ref_mol: None for ref_mol in topology.reference_molecules}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if rather than tagging the topology directly with this attribute, it would be cleaner to carry around some additional, extensible, internal state object which gets passed between successive handlers and that can store intermediate data such as this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is admittedly a cheap hack. The object which this information should be stored on is the System (because then after parameterization, users will be able to look right at a particle and see what ParameterHandler the charge came from).

I don't know how I feel about introducing an internal state object, so I'm thinking (typing) out loud here... It seems like this would partially be a "parametrization event" object, effectively like a rich log of what happened. This might be good for tracking provenance -- and actually we just talked about this at the MolSSI workshop (It would be neat if system prep tools produced an output which capture the important details of what they've done, which could be submitted as SI, relevant to @karmencj ). So I can envision it as purely an object to be written to. However, becomes more complex if it passes information back to the FF, topology, or system during the parametrization process (which seems to be the proposal here). Unless there's some clear model of when it talks to the other objects, we could end up with some very hard-to-test behaviors. I'll keep thinking about how this could work, though.

Copy link
Contributor

@SimonBoothroyd SimonBoothroyd Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object which this information should be stored on is the System

This may be the case - although I'm not sure what the status of this object is?

This might be good for tracking provenance

In principle it would - it's never a bad thing to keep track of what went into and came out of the different modular handlers.

However, becomes more complex if it passes information back to the FF, topology, or system during the parametrization process [...] Unless there's some clear model of when it talks to the other objects

I don't think I understand this point - where would this complexity arise? The object wouldn't need to talk to anything, it would purely be an object which gets passed between handlers which would be a readable source of previously assigned metadata which could either be utilized if needed, or ignored if not, and then extended ready to be passed along the chain of handlers. In principle the object could be as simple as a dict of dicts, where each key is the handler name and the value the dict of provenance.

we could end up with some very hard-to-test behaviors

I think that is inherent in having modular components which can be almost arbitrarily pieced together, but provided the required input and provided output provenance of each handler is well defined, then it shouldn't be overly bad!

Honestly I don't have particularly strong opinions about this, but given that in the future there is probably going to be cases where handlers may need access to information about what has already been done to the system (as is the case here), it might be worth thinking about how this will worker earlier rather than later!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very careful about decorating the source Topology object with something that should change when the same Topology object is fed to a different ForceField.

I do think we should seriously consider a new openforcefield System object as the right way to contain all of the applied parameterization data accessible via a public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Topology.add_molecule has undefined behavior if given an existing molecule with different atom indexing Implement Library Charges

8 participants