-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Purpose
Because it's a complicated topic, and to maintain the ability to restart this discussion in the future, I think it's important to record our design decisions with regard to stereochemistry in this repository.
Background
Stereochemical differences are important in the SMIRNOFF spec, as they can lead to different partial charges or partial bond orders on a molecule, and (if the project goes in that direction), different assignment of library parameters using stereochemistry-dependent SMARTS.
However, the definition of meaningful stereochemistry can change depending on experimental conditions and timescale. For example a tertiary amine will invert 10^3 to 10^5 times per second in solution. Therefore, on a wet-lab-experiment timescale, the amine does not have meaningful stereochemistry. However, if one is running a QM calculation on the tertiary amine, there may be meaningful differences in the results depending on its stereochemistry. Thus, whether or not stereochemistry is required in a tertiary amine depends on your use case.
Our definition of stereochemistry will ideally cover the fact that MD simulations can run for micro- and milli-seconds, therefore molecular features that are likely to change stereochemistry on those timescales should not be assigned stereochemistry-specific properties. It is unclear currently whether this decision should be made at the point of SMIRNOFF molecule definition (ie, we enforce that short-lived stereochemical features do not distinguish one OFF molecule from another, thereby preventing our engine from assigning them different simulation parameters), or if the SMARTS that define our parameter libraries should be policed moving forward, to ensure that no two parameters are differentiated only by low-barrier stereochemistry (eg, we will never create a SMARTS that relies on the stereochemistry of a tertiary amine).
Current implementation plans
Read below for details, but I think a good plan is to let the toolkits use their own (slightly different!) standards in choosing how to accept SMILES. While they mostly agree with what "fully specified stereochemistry" is, they differ on some borderline cases.
While this isn't ideal, ensuring the same stereochemistry standard across toolkits will require a lot of engineering, and will restrict us to the union of the most restrictive stereochemistry standards of all of our toolkits (eg. OE omega complains if bond stereo isn't defined for CC=NH, which is a bit much).
3D molecule input
We haven't run into too many problems with sdf and mol2 input. OE and RDK currently use their default stereochemistry detection in these cases and seem to rarely have problems. I haven't dug very deeply into these, so I will update this post as I look further into it.
Potentially relevant to 3D stereochemistry detection: #141, #137
SMILES input
"Common sense" atom and double bond stereochemistry is required, as is implemented by default in OE and RDK. There are, however, a few cases where disagreement occurs, both between OE and RDK, and between a toolkit and our intuition.
This table records the cases of obvious and border-case stereochemistry we've found so far, whether we think molecules should labeled as having undefined stereochemistry (on the MD timescale), and whether the toolkits define the SMILES as having undefined stereochemistry. If applicable, experimental barriers/interconversion rates from literature should be added.
| SMILES | Description | Humans say it has undefined stereochem? | OE says it has undefined stereochem? | RDK says it has undefined stereochem? | barrier | conversion rate |
|---|---|---|---|---|---|---|
C(Cl)(F)(Br)C |
Undefined tetrahedral center | Yes | Yes | Yes | ||
C(F)=C(Cl) |
Undefined C=C double bond | Yes | Yes | Yes | ||
C\C(F)=C(/F)CC(C)(Cl)Br |
Undefined stereocenter, defined stereobond | Yes | Yes | Yes | ||
CC(F)=C(F)C[C@@](C)(Cl)Br |
Defined stereocenter, undefined stereobond | Yes | Yes | Yes | ||
C\C(F)=C(/F)C[C@@](C)(Cl)Br |
defined stereocenter, defined stereobond | No | No | No | ||
CN=C(F)C |
Undefined imine | Yes | Yes | Yes | > 40 kcal/mol | |
N=C(F)C |
Undefined primary imine | No | Yes | No | ||
N(Cl)(F)C |
Undefined pyrimidal nitrogen | No | Optional [1] | No | 10^3 - 10^5 sec^-1 | |
CP(O)(=O)N |
tetrahedral phosphate with one protonated oxygen | Yes | Yes | Yes | ||
CNC(C)=O |
keto-form of amide bond | No | No | No | ||
CN=C(C)O |
enol-form of amide bond | [2] | Yes | Yes | ||
C(=[O+][H])C |
oximium | ? | Yes | No | ||
N(C)(CC)(CCC)=O |
? | Yes? | Yes | No | ||
C[C@H]1CN1C, may have tested using different method |
trivalent N in 3-membered ring | Yes, as of 2019.3 | Yes | Yes | ||
C[C@H]1CN1, may have tested using different method |
trivalent N in 3-membered ring (1 hydrogen) | ? | No? | Yes | ||
[H]c1c(c(c(c(c1[H])[H])P2C(=C3C(=C2c4c(c(c(c(n4)[H])[H])[H])[H])C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])c5c(c(c(c(n5)[H])[H])[H])[H])[H])[H] |
trivalent phosphorous | ? | No | Yes | considered stereogenic starting in RDK 2020.3 release rdkit/rdkit#2949 | |
CCS(=O)C |
thione | ? | ? | ? | ? | ? |
COP(=O)([O-])OCC |
deprotonated NA phosphate backbone | N | ? | ? | ? | deprotonated at physiological pH (pKa=2ish) |
[1] OE says "Yes" when using method A, but when "No" when method B is given enum_nitrogens=False.
[2] I'm not certain what the rotation timescale for amide bonds is, but this might just be in molecular dynamics range. Letting the enol form of the amide get stereochemistry might give users the choice of whether they want it defined.
Code to reproduce the above
from rdkit import Chem
from rdkit.Chem import EnumerateStereoisomers
# Define the SMILESes,
smiles_ambiguous = [
('C(Cl)(F)(Br)C', True), # Undefined tetrahedral center
('C(F)=C(Cl)', True), # Undefined RC=CR
('C(F)(Br)=C(Cl)(I)', True), # Undefined RC=CR
("C\C(F)=C(/F)CC(C)(Cl)Br", True), # Undefined stereocenter, defined stereobond
("CC(F)=C(F)C[C@@](C)(Cl)Br", True), # Defined stereocenter, undefined stereobond
("C\C(F)=C(/F)C[C@@](C)(Cl)Br", False), # defined stereocenter, defined stereobond
('CN=C(F)C', True), # RC=NR
('N=C(F)C', False), # RC=NH
('N(Cl)(F)C', False), #Pyrimidal nitrogen center
('CP(O)(=O)N', True), #phosphate
('CNC(C)=O', False), # keto form of amide
('CN=C(C)O', False), #enol form of peptide
('C(=[O+][H])C', False), # charged oxygen
('N(C)(CC)(CCC)=O', True), # ?
]
def rdkit_unspec_stereo(smiles):
from rdkit import Chem
from rdkit.Chem import EnumerateStereoisomers
rdmol = Chem.MolFromSmiles(smiles)
# Adding H's can hide undefined bond stereochemistry, so we have to test for undefined stereo here
unspec_stereo = False
rdmol_copy = Chem.Mol(rdmol)
enumsi_opt = EnumerateStereoisomers.StereoEnumerationOptions(maxIsomers=2, onlyUnassigned=True)
stereoisomers = [isomer for isomer in Chem.EnumerateStereoisomers.EnumerateStereoisomers(rdmol_copy, enumsi_opt)]
if len(stereoisomers) != 1:
unspec_stereo = True
if unspec_stereo:
return True
return False
def oe_unspec_stereo_A(smiles):
molecule = oechem.OEMol()
oechem.OESmilesToMol(molecule, smiles)
atoms_to_highlight = list()
bonds_to_highlight = list()
for atom in molecule.GetAtoms():
if atom.IsChiral() and not atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral):
# Check if handness is specified
v = []
for nbr in atom.GetAtoms():
v.append(nbr)
stereo = atom.GetStereo(v, oechem.OEAtomStereo_Tetrahedral)
if stereo == oechem.OEAtomStereo_Undefined:
atoms_to_highlight.append(atom)
for bond in molecule.GetBonds():
if bond.IsChiral() and not bond.HasStereoSpecified(oechem.OEBondStereo_CisTrans):
v = []
for neigh in bond.GetBgn().GetAtoms():
if neigh != bond.GetEnd():
v.append(neigh)
break
for neigh in bond.GetEnd().GetAtoms():
if neigh != bond.GetBgn():
v.append(neigh)
break
stereo = bond.GetStereo(v, oechem.OEBondStereo_CisTrans)
if stereo == oechem.OEBondStereo_Undefined:
bonds_to_highlight.append(bond)
if len(atoms_to_highlight+bonds_to_highlight) == 0:
return False
return True
def oe_unspec_stereo_B(smiles):
from openeye import oechem
from openeye import oeomega
maxcenters = 5
force_flip = False
enum_nitrogens = False
mol = oechem.OEMol()
if not oechem.OESmilesToMol(mol, smiles):
print(s, "Not Parsed")
n_isomers = 0
for isomer in oeomega.OEFlipper(mol, maxcenters, force_flip, enum_nitrogens):
n_isomers += 1
if n_isomers == 1:
return False
elif n_isomers > 1:
return True
method_list = [rdkit_unspec_stereo, oe_unspec_stereo_A, oe_unspec_stereo_B]
for smiles, should_be_ambiguous in smiles_ambiguous:
method_results = [method(smiles) for method in method_list]
print(smiles, should_be_ambiguous, method_results[0], method_results[1], method_results[2])