Skip to content

Free energy calculations

sboresch edited this page Oct 15, 2024 · 33 revisions

Free energy calculations

This section discusses the preparation for relative and absolute free energy calculations using neural network potentials (NNPs). We outline the treatment of dummy atoms, the implications of energy mixing and parameter mixing approaches, and strategies that avoid generating unphysical states, ensuring a smooth and efficient transformation between the two topologies of interest.

The free energy between two states is given by

$$ \Delta A_{i\mapsto j} = -k_BT \ln \frac{Q_j}{Q_i} $$

with:

  • $\Delta A_{i\mapsto j}$ as the Helmholtz free energy difference,
  • $k_B$ as the Boltzmann constant
  • $T$ as the temperature
  • $Q_{i}$ and $Q_{j}$ are the configurational partition functions of states $i$ and $j$

The partition function $Q_i$ is given by

$$ Q_i = \int e^{-\beta U(\vec{x})} d\vec{x} $$

with:

  • $\beta = \frac{1}{k_BT}$
  • $U_i(\vec{x})$ as the potential energy of the system in state $i$ as a function of the coordinates $\vec{x}$

Important

Given the high computational cost and unfavorable scaling of memory consumption with the number of atoms when using NNPs, methods like expanded ensemble sampling, Hamiltonian replica exchange, or $\lambda$-dynamics are appropriate to enhance sampling efficiency and manage computational resources effectively.

These advanced sampling techniques allow for efficient exploration of the configurational space and can mitigate the computational challenges associated with NNPs in free energy calculations.

High-level overview

If we want to perform free energy calculations using dummy atoms, scaling interactions between atoms is necessary.

This can be achieved in multiple ways.

  1. mixing of endstate energies: we can remove the atom's contribution to the total energy by (1) masking atom-specific contributions during the reduction step from per-atom to per-system features and (2) removing atoms from the pairlist to mask their environment (pairwise) contributions. This allows the introduction of dummy atoms (but only in the binary sense: interaction or not-interacting, no scaling is possible). This approach simulates physical endstates (at integer $\lambda$ values) and intermediate states by mixing endstate energies. It requires restraints to keep dummy atoms in position.
  2. Decoupling atoms: We can scale the interaction between atom groups at the level of the message update function (for message-passing networks) or, more generally, at the level of the effective pair distances (e.g., implemented as 4D pair distances).

Dummy Atoms in Free Energy Calculations

In alchemical free energy calculations, especially when transforming between molecules of different sizes or topologies, dummy atoms are introduced to maintain a constant number of particles. Dummy atoms are placeholders that do not interact with the environment or other atoms when fully decoupled. They facilitate transformations where a direct one-to-one mapping of atoms between initial and final states is impossible.

Challenges with dummy atoms in NNPs:

  • Lack of intramolecular interactions: When dummy atoms are fully decoupled (no bonded or nonbonded interactions), they lack the forces necessary to maintain meaningful positions relative to the rest of the system.
  • Sampling inefficiency: without proper restraints, dummy atoms can drift freely, leading to poor phase space overlap between adjacent alchemical states ($\lambda$-states). This results in increased statistical errors and inefficient sampling.
  • Unphysical configurations: The training domain of the NNP restricts the use of dummy atoms to physical reasonable configurations that are evaluated. In other words, there can't be a carbon atom with five hydrogens.

These considerations limit the possible free energy approaches (e.g., single topology is not an option).

Dual topology/single coordinate

The dual topology/single coordinate method combines atoms from both the initial and final states into a single system with shared coordinates. Substructure matching is the base requirement to map two molecular graphs on each other. This superset topology includes all atoms present in either state, and the interactions are modulated by a coupling parameter $\lambda$.

There are two obvious ways to compute the free energy difference for such a setup: energy mixing and interaction scaling.

Energy mixing

It involves expressing the total potential energy $U_{\text{total}}$ as a weighted sum of the energies of the initial ($L_1$) and final ($L_2$) states:

$$ U_{\text{total}}(\vec{x}; \lambda) = \lambda U_{L_1}(\vec{x}) + (1- \lambda) U_{L_2}(\vec{x}) $$

Considerations:

  • requires two separate evaluations (passes through the neural network), one for each topology, at each step.
  • ensures physical realism of the configuration passed through the network
  • trivial to implement

Interaction scaling

Message passing networks

Most NNPs model many-body interactions as a set of pairwise interactions. There are two ways to scale atomic interactions:

Modulate the pairwise messages: for each atom $i$ the embedding $s_i$ is updated based on the messages received from its neighbors within a given cutoff $r_c$:

$$ s_i^{l+1} = s_i^l + \sum_{j\in N(i)} \lambda_{i,j} * m_{i,j}(s_j, d_{i,j}) $$

with:

  • $s_i^l$ as the embedding of atom $i$ at layer $l$
  • $m$ is the message function
  • $\lambda$ is the scaling factor $\in [0,1]$
  • $d$ is the pairwise distance

$\lambda$ modulates the influence of message $m$ from atom $j$ to atom $i$. Scaling is applied at the level of atom pairs.

Modulate the pairwise distances: An alternative approach is to make the pairwise distance $d_{i,j}$ lambda dependent. In 3D space the distance between two atoms is:

$$ d_{ij} = ||r_i - r_j|| $$

introducing a 4th dimensions coordinate $s_{i,j}(\lambda)$ that scales from 0 to $r_c$

$$ s_ij(\lambda) = \lambda * r_c $$

modifies this to

$$ d_{i,j}(\lambda) = \sqrt{((r_i - r_j)^2 + s_{i,j}(\lambda)^2)} $$

resulting in the following message update function

$$ s_i^{l+1} = s_i^l + \sum_{j\in N(i)} m_{i,j}(s_j, d_{i,}(\lambda)) $$

The advantage of such a dependency is that it can increase the distance between atoms until the cutoff distance $r_c$ is approached. This avoids introducing scaling of the messages that, depending on the update function (which is ignored in the above formulation), might still be passed through a non-linear transformation, making the scaling effect difficult to interpret.

Absolute Free Energy calculations

See Computing hydration free energies of small molecules with first principles accuracy for an application of this principle.

To perform absolute free energy calculations, we can decouple a molecule from its environment by scaling the messages between solute and solvent atoms:

$$ s_{i,j}(\lambda) = \begin{cases} \lambda,~ \text{if}~i \in \text{solute,} j \in \text{solvent or vice versa} \\ 1, ~ \text{otherwise} \end{cases} $$