Container class for parameterized systems

I've been discussing (with @andrrizzi @SimonBoothroyd @j-wags @proteneer) the potential for solving a number of technical debt issues we have accumulated with a new Python container class to represent a parameterized system.

To recap, we have a few major accumulated technical debt issues:
* **Use of Swig-wrapped C++ objects as containers causes problems.** Currently, [`ForceField`](https://open-forcefield-toolkit.readthedocs.io/en/latest/api/generated/openforcefield.typing.engines.smirnoff.forcefield.ForceField.html#openforcefield.typing.engines.smirnoff.forcefield.ForceField) is mainly structured around constructing an OpenMM [`System`](http://docs.openmm.org/latest/api-python/generated/simtk.openmm.openmm.System.html#simtk.openmm.openmm.System) object, which is a Swig-wrapped C++ object that violates PEP8 naming principles with an highly un-pythonic API. The overhead of these conversions can also be significant, and they are time-consuming to serialize via XML.
* **Nonstandard unit handling.** OpenMM also uses `simtk.unit`, rather than a more standard library like [`pint`](https://pint.readthedocs.io), for handling units; this was discussed [here](https://github.com/openforcefield/openforcefield/issues/151). In addition, `simtk.unit` does not easily support ensuring dimensionality is correct.
* **Difficulty decoding parameter types for conversion.** While OpenMM provides flexible `CustomForce` classes which can _encode_ many functional forms for existing parameter types or new parameter types, it becomes very difficult to _decode_ these custom forces to convert to representations suitable for other simulation packages without a lot of specialized logic to try to detect what we encoded. OpenMM also generally mixes electrostatics and vdW interactions in a single [`NonbondedForce`](http://docs.openmm.org/latest/api-python/generated/simtk.openmm.openmm.NonbondedForce.html#simtk.openmm.openmm.NonbondedForce), which creates further complexity for decoding.
* **Inextensibility.** While OpenMM `System` objects are very flexible, they are inextensible without writing C++ plugins.
* **Tracking origin parameters in parameterized systems is difficult.** The parameterized OpenMM `System` object has all parameters unrolled into individual interactions, which makes it difficult to identify how origin parameters affect multiple interactions.
* **Our current scheme for inspecting applied parameters needs an object model.** The current scheme for inspecting which parameters are assigned to various force terms, using [`ForceField.label_molecules()`](https://open-forcefield-toolkit.readthedocs.io/en/latest/api/generated/openforcefield.typing.engines.smirnoff.forcefield.ForceField.html#openforcefield.typing.engines.smirnoff.forcefield.ForceField.label_molecules), returns a dict of lists of tuples of lists, which should very much be replaced with a sensible object model.
* **Dependence on ParmEd for parameter conversions limits us to Amber-centric forces and causes information loss.** Our current reliance on [ParmEd](https://github.com/ParmEd/ParmEd) for conversions limits us to an Amber-centric universe, causing us to struggle with both [information loss](https://github.com/openforcefield/openforcefield/issues/295) and will limit our ability to expand into new functional forms in the future.
* **The need to use OpenMM for recomputing energies and energy derivatives with respect to parameters is highly limiting.** While OpenMM is highly performant for running molecular simulations on GPUs, much of the capabilities we need for force field fitting is recomputation of specific energy components at various parameter values (which can incur significant overhead with OpenMM) or the computation of energy gradients with respect to parameters (which is much more complex with OpenMM, requiring extensive recoding as CustomForce objects). Instead, it would be convenient to easily render one or more forces to Python objects that can easily compute energies or parameter gradients using performant automatic differentiation methods, much like [timemachine](https://github.com/proteneer/timemachine) does.

Considering all of these limitations, I propose we create a new object model for parameterized systems that aims to solve as many of these problems as is conceivable via a good, extensible design, and amortize the cost of its implementation over various subprojects that need different capabilities.

Our design should:
* Be Pythonic, PEP8 compliant, and easily serializable with minimal overhead
* Use a standard units library like [`pint`](https://pint.readthedocs.io) that facilitates dimension-checking
* Has a logical representation of force components that more closely matches our concept of how forces are described in [SMIRNOFF](https://open-forcefield-toolkit.readthedocs.io/en/latest/smirnoff.html), where the modular nature of forces makes it easy to grow the spec and object model
* Allow for easy inspection of which parameters are assigned to different force terms
* Retain the connection between underlying parameter and corresponding terms in the parameterized system, perhaps allowing updates to the parameter set to generate updated parameters in the system
* Function as an object model that permits easy interconversion to multiple simulation packages without information loss like OpenMM, Amber, gromacs, and Desmond (if supported by the package), taking inspiration from [InterMol](https://github.com/shirtsgroup/InterMol), which closely adheres to this concept
* Also retain the `Topology` object from which the system was created to make it easier to identify interactions between desired atoms
* Provide convenience methods for computing energies as a function of configuration and parameters, as well as parameter gradients, possibly via multiple schemes (e.g. numpy, [jax](https://github.com/google/jax), [TensorFlow](https://www.tensorflow.org/)), a capability demonstrated by [timemachine](https://github.com/proteneer/timemachine)

I find it helpful to kick around ideas of what using the class would look like. Here's some very rough ideas for inspiration.

Suppose first that it would be useful to create something like a `ParameterSet` object that would replace the current awkward `ForceField` syntax of
```python
forcefield.get_handler('vdW').parameters['[#1:1]-[#7]']
```
with something like the more natural
```python
forcefield['vdW']['[#1:1]-[#7]']
```

This would enable manipulations like
```python
# Create an OFF force field object
from openforcefield.typing.engines.smirnoff import ForceField
forcefield = ForceField('openforcefield-1.0.offxml')
# Create an OFF parameterized system
system = forcefield.create_system(topology)
# Access the topology associated with that system
system.topology
# Access the forcefield parameter set associated with that system
system.forcefield['vdW']['[#1:1]-[#7]'].sigma
# Inspect the vdW sigma assigned to an particle 0
system.parameters['vdW'][0].sigma
# But we can still access the root parameter that controls all atoms of that type
system.parameters['vdW'][0].parameter.sigma
# and possibly change it
import pint
ureg = pint.UnitRegistry()
system.parameters['vdW'][0].parameter.sigma = 1.0 * ureg.angstrom
# NOTE: We have to figure out whether we should be allowed to change the parameters for individual particle or interactions in a way that would depart from the underlying parameter set, or just force that to change _all_ such instances in the set.
# Compute the potential energy for a given configuration
potential = system.potential_energy(positions)
# Create a method to compute the potential energy of just the vdW terms, using jax to achieve high performance, allowing parameter gradients
potential_energy_method = system.potential_factory(forces=['vdW'], parameter_gradients=['vdW'], method='jax')
```
None of this is intended to be a real suggestion for a final API, but just to give the flavor of what we could potentially implement in a manner that would let us easily solve many of the problems above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Container class for parameterized systems #310

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Container class for parameterized systems #310

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions