-
Notifications
You must be signed in to change notification settings - Fork 3
[DNM] Protein mutation support in Neq Cycling Setup Unit #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: protein-mutation-protocol
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## protein-mutation-protocol #106 +/- ##
============================================================
Coverage ? 84.60%
============================================================
Files ? 16
Lines ? 1598
Branches ? 0
============================================================
Hits ? 1352
Misses ? 246
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@IAlibay this one should be ready for review. Please note that the |
|
@IAlibay I have found issues with some specific mappings. I think we need to postpone this one while I tackle these issues. I'm getting |
IAlibay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial quick review - I mostlly am not very convinced with the idea of doing the template registration / partial charge assignment in a separate place and then using offmols with possibly different partial charges everywhere else.
| solvent_comp_a = ( | ||
| solvent_comps.pop() if solvent_comps else None | ||
| ) # Get the first component if exists | ||
| protein_comps_a = get_typed_components(state_a, ProteinComponent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch these to gufe.ChemicalSystem.get_components_of_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, this is a good point. At the time this API point didn't exist. Thanks for the reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in a64135f
| ligand_a = ligand_mapping.componentA | ||
| ligand_b = ligand_mapping.componentB | ||
|
|
||
| solvent_comps = get_typed_components( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it would be better to just asset a single solvent component in the protocol validator than to have to work around these type of things.
i.e. currently there zero cases where having more than 1 SolventComponent could make sense, indeed even with mixed solvent systems, we're likely to just create a single solvencomponent that would contain a list of smiles rather than stacking them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's a reasonable assumption. I don't know if it actually makes the code simpler, since we are allowing for multiple components of the other types (protein and small mols), so we could use the same logic for all of them without having branches. That said, I do think it makes sense in the validator of the protocol to allow only one solvent component, since that's what we currently support.
| all_openff_mols = list( | ||
| chain(all_alchemical_mols.values(), common_small_mols.values()) | ||
| ) | ||
| self._assign_openff_partial_charges( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't assign partial charges at this point, you'll end up with bad things happening - i.e. variable partial charges everywhere being assigned with the wrong partial charge method.
| try: | ||
| # Minimize | ||
| openmm.LocalEnergyMinimizer.minimize(context) | ||
| # Optionally store minimized topology -- Mostly for debugging purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not always store it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in 2896683
| # This can be populated however we want | ||
| return outputs | ||
|
|
||
| # TODO: Maybe this could be a utility function. Is this something protocol-specific? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly specific to this Protocol - also this should go in the Protocol validate method.
| # print(f"Free energy = {fe_estimate} +/- {fe_error}") # DEBUG | ||
|
|
||
| @pytest.mark.skip( | ||
| reason="Ambertools failing to parameterize. Review when we have full nagl." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no reason why ambertools fails to generate charges for toluene.. this seems to be a bug not something that should wait for replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably due to some hiccups in ambertools and openmmforcefields at the time, I have to check again if this is the case. I remember at the time this was independent of our code base, and more related to some things not getting solved correctly in the environment. Something to revisit now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The errors that I'm getting locally are partially as follows:
sh: symbol lookup error: sh: undefined symbol: rl_print_keybinding
/home/user/miniforge3/envs/feflow-dev/bin/wrapped_progs/antechamber: Fatal Error!
Cannot properly run "/home/user/miniforge3/envs/feflow-dev/bin/bondtype -j full -i ANTECHAMBER_BOND_TYPE.AC0 -o ANTECHAMBER_BOND_TYPE.AC -f ac".
and
ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name 'toluene' and SMILES '[H][c]1[c]([H])[c]([H])[c]([C]([H])([H])[H])[c]([H])[c]1[H]', 'partial_charge_method': 'am1bcc', 'use_conformers': [<Quantity([[ ...
It could point to something not getting resolved correctly in the environment (maybe due to badly specified dependencies elsewhere/upstream). But not entirely sure what's triggering this one, it just don't seem to be related to our code base as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like an issue with linking to the right readline library, we have some code paths that test this so I think it is more likely an issue with this env.
ValueError: No registered toolkits can provide the capability "assign_partial_charges"
openff will throw this error if there are errors when using ambertools -- do you get these errors locally and on CI?
|
|
||
| # Generate and register FF parameters in the system generator template | ||
| all_openff_mols = [comp.to_openff() for comp in all_small_mols] | ||
| register_ff_parameters_template( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this and then using a different set of offmols elsewhere (with different partial charges) is a bad hack imho - because if either the templating behaviour changes or anything else ends up needing partial charges that doesn't go through templates you'll end up with a bunch of offmols downstream that have the wrrong thing.
I would strongly suggest generating the offmols & charge them ahead of them and keep them around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope here was that registering the parameters/template in the cache was enough when these were encountered again in the future. Instead of having to keep track of some new offmols around.
My concern was the same, if we keep the offmols around, there's a risk we change something in some of them and not the others (or change the offmol but not the component of origin) and we would also have undesired behavior.
I agree that either way can lead to this, but my idea was having to avoid having offmols around and always sourcing things from the original components from the input, as possible. Is there a reason to believe my implementation is not achieving that? I'm not using this all_openff_mols at all after this, maybe we could just use the list comprehension directly in the register_ff_parameters_template to avoid having them around?
feflow/utils/misc.py
Outdated
| If the component does not support the necessary conversion methods. | ||
| """ | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because technically protein components can try to go via openff, I wouldn't rely on an AttributeErrorr check here - I would isinstance check and enforce the correct route.
| return topology | ||
|
|
||
|
|
||
| def get_positions_from_component( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Medium term, maybe we should just add to_positions methods in the Components themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable, but maybe we need to think about how would this work for the solvent component.
| ValueError | ||
| If the atom index is not found in the topology. | ||
| """ | ||
| for residue in topology.residues(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than looping over residues and then over atoms, why not just loop over atoms, and get the residue? It would avoid you doing an extra for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the logic on this one and now I'm not using this utility function. More info at: d8ede01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards just removing this function if we are not using it.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
These changes aim to add support to be able to perform protein mutations in the current
SetupUnitthat theNonEquilibriumCyclingProtocoluses.This mostly implies making sure that the alchemical component is now not assumed to be a
SmallMoleculeComponentbut also aProteinComponent.Utility functions for achieving such purpose were written or modified accordingly. Someof these changes were needed to be made on the
openfeside of things, since we have not yet decided on migrating them tofeflow(this code base).