Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMM/OpenFF interfaces fail for multi-component molecules #191

Closed
andrewtarzia opened this issue Nov 21, 2024 · 2 comments
Closed

OpenMM/OpenFF interfaces fail for multi-component molecules #191

andrewtarzia opened this issue Nov 21, 2024 · 2 comments

Comments

@andrewtarzia
Copy link
Member

andrewtarzia commented Nov 21, 2024

Using stko.OpenMMEnergy with an stk.Molecule with multiple molecules (e.g. host guest system) fails with an error from the openff interchange.

/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/toolkit/topology/molecule.py:4352: MultipleComponentsInMoleculeWarning: RDKit Molecule passed to from_rdkit consists of more than one molecule, consider running rdkit.Chem.AllChem.GetMolFrags(rdmol, asMols=True) or splitting input SMILES at '.' to get separate molecules and pass them to from_rdkit one at a time. While this is supported for legacy reasons, OpenFF Molecule objects are not supposed to contain disconnected chemical graphs and this may result in undefined behavior later on. The OpenFF ecosystem is built to handle multiple molecules, but they should be in a Topology object, ex: top = Topology.from_molecules([mol1, mol2])
  molecule = toolkit.from_rdkit(
Traceback (most recent call last):
  File "/rds/general/user/ewolpert/home/Students/Xin/G17_openMM_EW.py", line 75, in <module>
    main()
  File "/rds/general/user/ewolpert/home/Students/Xin/G17_openMM_EW.py", line 53, in main
    optimised_win_win = optimisation_sequence.optimize(win_win)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/stko/_internal/optimizers/optimizers.py", line 115, in optimize
    mol = optimizer.optimize(mol)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/stko/_internal/optimizers/openmm.py", line 152, in optimize
    interchange = Interchange.from_smirnoff(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/components/interchange.py", line 150, in from_smirnoff
    return _create_interchange(
           ^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/smirnoff/_create.py", line 137, in _create_interchange
    _electrostatics(
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/smirnoff/_create.py", line 292, in _electrostatics
    "Electrostatics": SMIRNOFFElectrostaticsCollection.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py", line 455, in create
    handler.store_matches(
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py", line 846, in store_matches
    flag, matches, potentials = self._assign_charges_from_molecules(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rds/general/user/ewolpert/home/anaconda3/envs/stk_open/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py", line 803, in _assign_charges_from_molecules
    for index_in_molecule_with_charges, partial_charge in enumerate(
                                                          ^^^^^^^^^^
TypeError: 'NoneType' object is not iterable
@andrewtarzia
Copy link
Member Author

Based on trace back (above) I think the issue is in calculate method, specifically these lines:

        molecule = Molecule.from_rdkit(
            rdmol=rdkit_mol,
            allow_undefined_stereo=True,
            hydrogens_are_explicit=True,
        )

        if self._partial_charges_method == "mmff94":
            molecule.assign_partial_charges(
                self._partial_charges_method,
                toolkit_registry=RDKitToolkitWrapper(),
            )

        if self._partial_charges_method == "espaloma-am1bcc":
            molecule.assign_partial_charges(
                self._partial_charges_method,
                toolkit_registry=EspalomaChargeToolkitWrapper(),
            )

        topology = molecule.to_topology()

From:

MultipleComponentsInMoleculeWarning: RDKit Molecule passed to from_rdkit consists of more than one molecule,
consider running rdkit.Chem.AllChem.GetMolFrags(rdmol, asMols=True) or splitting input SMILES at '.' to get separate 
molecules and pass them to from_rdkit one at a time. While this is supported for legacy reasons, OpenFF Molecule 
objects are not supposed to contain disconnected chemical graphs and this may result in undefined behavior later 
on. The OpenFF ecosystem is built to handle multiple molecules, but they should be in a Topology object, ex: top = 
Topology.from_molecules([mol1, mol2])
  molecule = toolkit.from_rdkit(

I propose replacing the lines above in our code with something like (and get_disconnected_components from https://gist.github.com/andrewtarzia/6cd18699bf68a7698d1ab01247f31619):

        # Get disconnected molecules (maybe there is a better way to do this with rdkit).
        disconnected_molecules = list(get_disconnected_components(mol).values())
        
        # Make OpenFF Molecules.
        for disconnected in disconnected_molecules:
                molecule = Molecule.from_rdkit(
                    rdmol=disconnected.to_rdkit_mol(),
                    allow_undefined_stereo=True,
                    hydrogens_are_explicit=True,
                )

                if self._partial_charges_method == "mmff94":
                    molecule.assign_partial_charges(
                        self._partial_charges_method,
                        toolkit_registry=RDKitToolkitWrapper(),
                    )
        
                if self._partial_charges_method == "espaloma-am1bcc":
                    molecule.assign_partial_charges(
                        self._partial_charges_method,
                        toolkit_registry=EspalomaChargeToolkitWrapper(),
                    )

        # Parse to Topology.
        topology = .from_molecules(openff_molecules)

@andrewtarzia
Copy link
Member Author

Rereading the error, they offer rdkit.Chem.AllChem.GetMolFrags(rdmol, asMols=True), which I have not used, but I am sure you could do something like:

rdkit_mol = mol.to_rdkit_mol()
fragment_mols = rdkit.Chem.AllChem.GetMolFrags(rdkit_mol, asMols=True)

# Iterate and assign the charges....

# Parse to Topology.
topology = .from_molecules(openff_molecules)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant