-
Notifications
You must be signed in to change notification settings - Fork 100
Description
I've been discussing (with @andrrizzi @SimonBoothroyd @j-wags @proteneer) the potential for solving a number of technical debt issues we have accumulated with a new Python container class to represent a parameterized system.
To recap, we have a few major accumulated technical debt issues:
- Use of Swig-wrapped C++ objects as containers causes problems. Currently,
ForceFieldis mainly structured around constructing an OpenMMSystemobject, which is a Swig-wrapped C++ object that violates PEP8 naming principles with an highly un-pythonic API. The overhead of these conversions can also be significant, and they are time-consuming to serialize via XML. - Nonstandard unit handling. OpenMM also uses
simtk.unit, rather than a more standard library likepint, for handling units; this was discussed here. In addition,simtk.unitdoes not easily support ensuring dimensionality is correct. - Difficulty decoding parameter types for conversion. While OpenMM provides flexible
CustomForceclasses which can encode many functional forms for existing parameter types or new parameter types, it becomes very difficult to decode these custom forces to convert to representations suitable for other simulation packages without a lot of specialized logic to try to detect what we encoded. OpenMM also generally mixes electrostatics and vdW interactions in a singleNonbondedForce, which creates further complexity for decoding. - Inextensibility. While OpenMM
Systemobjects are very flexible, they are inextensible without writing C++ plugins. - Tracking origin parameters in parameterized systems is difficult. The parameterized OpenMM
Systemobject has all parameters unrolled into individual interactions, which makes it difficult to identify how origin parameters affect multiple interactions. - Our current scheme for inspecting applied parameters needs an object model. The current scheme for inspecting which parameters are assigned to various force terms, using
ForceField.label_molecules(), returns a dict of lists of tuples of lists, which should very much be replaced with a sensible object model. - Dependence on ParmEd for parameter conversions limits us to Amber-centric forces and causes information loss. Our current reliance on ParmEd for conversions limits us to an Amber-centric universe, causing us to struggle with both information loss and will limit our ability to expand into new functional forms in the future.
- The need to use OpenMM for recomputing energies and energy derivatives with respect to parameters is highly limiting. While OpenMM is highly performant for running molecular simulations on GPUs, much of the capabilities we need for force field fitting is recomputation of specific energy components at various parameter values (which can incur significant overhead with OpenMM) or the computation of energy gradients with respect to parameters (which is much more complex with OpenMM, requiring extensive recoding as CustomForce objects). Instead, it would be convenient to easily render one or more forces to Python objects that can easily compute energies or parameter gradients using performant automatic differentiation methods, much like timemachine does.
Considering all of these limitations, I propose we create a new object model for parameterized systems that aims to solve as many of these problems as is conceivable via a good, extensible design, and amortize the cost of its implementation over various subprojects that need different capabilities.
Our design should:
- Be Pythonic, PEP8 compliant, and easily serializable with minimal overhead
- Use a standard units library like
pintthat facilitates dimension-checking - Has a logical representation of force components that more closely matches our concept of how forces are described in SMIRNOFF, where the modular nature of forces makes it easy to grow the spec and object model
- Allow for easy inspection of which parameters are assigned to different force terms
- Retain the connection between underlying parameter and corresponding terms in the parameterized system, perhaps allowing updates to the parameter set to generate updated parameters in the system
- Function as an object model that permits easy interconversion to multiple simulation packages without information loss like OpenMM, Amber, gromacs, and Desmond (if supported by the package), taking inspiration from InterMol, which closely adheres to this concept
- Also retain the
Topologyobject from which the system was created to make it easier to identify interactions between desired atoms - Provide convenience methods for computing energies as a function of configuration and parameters, as well as parameter gradients, possibly via multiple schemes (e.g. numpy, jax, TensorFlow), a capability demonstrated by timemachine
I find it helpful to kick around ideas of what using the class would look like. Here's some very rough ideas for inspiration.
Suppose first that it would be useful to create something like a ParameterSet object that would replace the current awkward ForceField syntax of
forcefield.get_handler('vdW').parameters['[#1:1]-[#7]']with something like the more natural
forcefield['vdW']['[#1:1]-[#7]']This would enable manipulations like
# Create an OFF force field object
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openforcefield-1.0.offxml')
# Create an OFF parameterized system
system = forcefield.create_system(topology)
# Access the topology associated with that system
system.topology
# Access the forcefield parameter set associated with that system
system.forcefield['vdW']['[#1:1]-[#7]'].sigma
# Inspect the vdW sigma assigned to an particle 0
system.parameters['vdW'][0].sigma
# But we can still access the root parameter that controls all atoms of that type
system.parameters['vdW'][0].parameter.sigma
# and possibly change it
import pint
ureg = pint.UnitRegistry()
system.parameters['vdW'][0].parameter.sigma = 1.0 * ureg.angstrom
# NOTE: We have to figure out whether we should be allowed to change the parameters for individual particle or interactions in a way that would depart from the underlying parameter set, or just force that to change _all_ such instances in the set.
# Compute the potential energy for a given configuration
potential = system.potential_energy(positions)
# Create a method to compute the potential energy of just the vdW terms, using jax to achieve high performance, allowing parameter gradients
potential_energy_method = system.potential_factory(forces=['vdW'], parameter_gradients=['vdW'], method='jax')None of this is intended to be a real suggestion for a final API, but just to give the flavor of what we could potentially implement in a manner that would let us easily solve many of the problems above.