-
Notifications
You must be signed in to change notification settings - Fork 4
Protonation state sampling
Consider a system at equilibrium in a semigrand ensemble relevant for constant-pH simulation.
The thermodynamic parameters
Denote the instantaneous configuration of the system by
We can write the reduced potential
(NOTE: The inverse thermal energy
If we instead fix the number of protons at
The probability density of interest is given by
where the unnormalized probability density
and the semi-grand canonical partition function
and the isothermal-isobaric partition function
We have used the shorthand
in the equation above.
Note: We use the shorthand
Broadly, there are two major strategies that could be adopted to address discrepancies between experimental and calculated solution pKas for water (and other solvents), small molecules, and amino acid residues:
Broadly, this strategy is to compute the effective chemical potential for the ML potential, subtract it off, and add back the desired chemical potential.
First, we consider the case of correcting the water pKa to provide appropriate pH. Roughly, this strategy involves the development of a mapping from the desired pH to an effective chemical potential that corrects for the deviation between experimental and computed chemical potentials for bulk water.
Consider a water box in equilibrium with a chemical reservoir of protons at a given pH.
Presuming we implicitly specify a fixed number
If we knew the relative free energies
The definition of pH is in terms of the activity
where
(TODO: Revise this section for a better handling of the activity.)
This suggests that, should we need to calibrate
This approach was used, for example, in calibrating the chemical potential of neutral salt pairs in TIP3P water in SaltSwap.
However, it is likely we can exploit a more direct relationship between
Much the same way we expect to include loss terms in training or fine-tuning that penalize deviations from experimental binding free energies, we can penalize deviations between computed and experimental chemical potentials.
Note that there may be finite-size effects that require us to extrapolate unless we use sufficiently large solvent boxes, but this can be rapidly established by computing the chemical potential of proton insertion/deletion for neutral bulk water as a function of
Because experimental pKas may not reliably provide direct access to chemical potentials---see this paper for a more detailed explanation of experimental pKa measurements and their challenges---it may be desirable to fit directly to experimental electrochemical (or UV-metric) titration data that can be directly computed from knowledge of the proton-dependent free energies
Similar to binding free energies, the free energies of relevant protonation states could be computed at a reference parameter set
Because protons may migrate in solution, we might reasonably wonder whether the free energy differences we compute are for the species of interest.
We don't know!
We can use heuristics to assign molecularity and compute chemical species, but this is only approximate.
In reality, we can simply compute the ladder of free energies
We might therefore explore whether there are shortcuts, such as:
- Using gas-phase proton affinities or related energies instead
- Using restraints to constrain the chemical species being sampled (e.g. preventing proton dissociation or migration)
- Computing the solution free energy difference between protonation states of the molecule of interest only for neutral water, rather than with a significant
$\Delta N_H$ .
This section discusses several algorithms that may be suitable for sampling protonation states and tautomers in the context of Markov chain Monte Carlo molecular simulations.
For simplicity, we first consider algorithms for dealign with tautomerization. These algorithms can be easily adapted to handle protonation states in a straightforward manner.
Consider the tautomerization of imidazole
(TODO: Replace this with a better figure)
The simplest approach we will consider is an instantaneous Metropolis Monte Carlo proposal where we propose to "hop" the proton from one titratable heavy atom site to another.
- Step 1: Propose
$x_{new} \sim p(x_{new} | x_{old})$ where the proton at position 1 is moved to a position that is a Gaussian random variable about the N at position 3. - Step 2: Accept or reject the move with the Metropolis-Hastings criteria
where we can use the difference in reduced potentials
We expect the average acceptance rate of the instantaneous heavy atom centered Monte Carlo proposal
Instead, we can significantly improve the choice of proposal probability by learning the parameters of the Gaussian distribution in terms of the heavy atom and its connected neighbors.
Suppose we have an arrangement
where
We expect this should significantly increase the acceptance rate
Even with improved proton proposal distributions, in solvent, instantaneous placement of protons without allowing relaxation of the surrounding solvent will likely lead to low acceptance rates
A very simple way to improve this is to perform parallel proposals: If we propose
(TODO: Check that we have correctly included the proposal probabilities
In the limit of
This idea is succinctly described in Waste recycling Monte Carlo.
(TODO: Check if there are issues with this acceptance probability. Multiple-try Metropolis is an alternative.)
It is possible to achieve superlinear improvements in acceptance probability
There are numerous ways to use NCMC for this scenario.
First, we consider building on the previous algorithms that draw a new proton position using one of the hydrogen position proposal probabilities
First, partition the atoms into the proton under consideration for transport---$x_{H, old}$---and all other protons---$x_{core}.
We draw the position of a new additional proton
We then integrate
where
The acceptance probability for
If rejected, however, the entire trajectory
The velocity will also need to be reversed on acceptance or rejection to ensure the correct distribution is maintained---see NCMC for more details. If an MCMC scheme is used where protonation state moves and MD moves that begin with a new velocity draw from the Maxwell-Boltzmann distribution, this can be ignored since the velocity will be refreshed immediately afterwards anyway.
Note that, we can explore various alchemical potentials
As an alternative to transiently augmenting the number of particles, we can instead propose a displacement
where the alchemical energy function becomes
The acceptance probability replaces the proposal ratio
Excess protons are surprisingly mobile in bulk water. Water wires form spontaneously and enable the rapid nonlocal translocation of excess protons. In confined geometries, proton transport over large distances can occur on the sub-picosecond timescale.
As a result, we may not need to be overly concerned about where a proton is initially placed if we use NCMC to transport or insert/delete a proton---spontaneous water wires may effectively shuttle protons to/from relevant locations on their own. Whether this timescale is practical, of course, is a question for the Experiments section below.
Suppose we select a position for placing the proton that is uniform within the simulation box:
If we again integrate an NCMC switching protocol over
The acceptance probability will increase superlinearly in
We can build on this idea by simply attempting to insert/delete protons within the bulk.
Consider the incredibly simple algorithm:
Insertion:
- Propose proton N_H+1 position x_H uniformly within the box
- Perform NCMC insertion under NpT conditions where the protocol work for just the potential is integrated:
- Accept or reject with the probability
Deletion:
- Select a proton uniformly for deletion
- Perform NCMC deletion under NpT conditions where the protocol work for just the potential is integrated:
- Accept or reject with the probability
TODO: This elides a few steps that simplify the resulting algorithm---make the derivation from super-detailed balance with the reduced potential more explicit. Also check accept/reject proposals are correct.
It is possible to improve the acceptance rates further by biasing the selection of protons via effective pKa estimates and correcting for these proposal probabilities in the Metropolis-Hastings acceptance criteria.
To set a baseline, it is sensible to attempt to compute
The procedure is roughly:
- Sample equilibrium configurations
$x \sim \pi(x)$ for imidazole in vacuum or in water at 300 K and 1 atm to generate a collection of equilibrium snapshots$x_n$ . - Collect statistics on the log proposal ratio
$\log r$ and log acceptance ratio$\log s = - \Delta u$ by drawing an equilibrium snapshot$x_n$ , proposing a new proton position, and computing$\log r$ and$\log s$ - Analyze the statistics to estimate
$\log < P_{accept} >$ , bootstrapping to estimate its posterior uncertainty
We can then perform the same exercise using learned optimal equivariant functions
We can then implement various NCMC protocols to explore the distribution of the work
We can use a Wang-Landau-like scheme (e.g. SAMS, as we used in SaltSwap) to estimate the free energy
Note that the total charge of the system must be updated every time a proton is inserted or deleted.
Following the same experimental strategy above to compute the free energy
Following the same experimental strategy above to compute the free energy