Skip to content

Conversation

@ijpulidos
Copy link
Contributor

These changes aim to add support to be able to perform protein mutations in the current SetupUnit that the NonEquilibriumCyclingProtocol uses.

This mostly implies making sure that the alchemical component is now not assumed to be a SmallMoleculeComponent but also a ProteinComponent.

Utility functions for achieving such purpose were written or modified accordingly. Someof these changes were needed to be made on the openfe side of things, since we have not yet decided on migrating them to feflow (this code base).

@ijpulidos ijpulidos marked this pull request as draft January 30, 2025 18:03
@ijpulidos ijpulidos linked an issue Jan 30, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 91.89189% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (protein-mutation-protocol@00c1f15). Learn more about missing BASE report.

Files with missing lines Patch % Lines
feflow/utils/vendored.py 89.06% 7 Missing ⚠️
feflow/utils/misc.py 81.81% 6 Missing ⚠️
feflow/protocols/nonequilibrium_cycling.py 97.40% 2 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##             protein-mutation-protocol     #106   +/-   ##
============================================================
  Coverage                             ?   84.60%           
============================================================
  Files                                ?       16           
  Lines                                ?     1598           
  Branches                             ?        0           
============================================================
  Hits                                 ?     1352           
  Misses                               ?      246           
  Partials                             ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ijpulidos ijpulidos requested a review from IAlibay September 24, 2025 23:41
@ijpulidos
Copy link
Contributor Author

@IAlibay this one should be ready for review. Please note that the vendored.py adds a lot of lines of code to the changes, but these are basically the same that we had on openfe only that I had to copy a good part of the topology_helpers.py module (see OpenFreeEnergy/openfe#1539 for more information)

@ijpulidos
Copy link
Contributor Author

ijpulidos commented Oct 24, 2025

@IAlibay I have found issues with some specific mappings. I think we need to postpone this one while I tackle these issues. I'm getting KeyError.

@ijpulidos ijpulidos changed the title Protein mutation support in Neq Cycling Setup Unit [DNM] Protein mutation support in Neq Cycling Setup Unit Oct 24, 2025
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial quick review - I mostlly am not very convinced with the idea of doing the template registration / partial charge assignment in a separate place and then using offmols with possibly different partial charges everywhere else.

solvent_comp_a = (
solvent_comps.pop() if solvent_comps else None
) # Get the first component if exists
protein_comps_a = get_typed_components(state_a, ProteinComponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch these to gufe.ChemicalSystem.get_components_of_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is a good point. At the time this API point didn't exist. Thanks for the reminder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in a64135f

ligand_a = ligand_mapping.componentA
ligand_b = ligand_mapping.componentB

solvent_comps = get_typed_components(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it would be better to just asset a single solvent component in the protocol validator than to have to work around these type of things.

i.e. currently there zero cases where having more than 1 SolventComponent could make sense, indeed even with mixed solvent systems, we're likely to just create a single solvencomponent that would contain a list of smiles rather than stacking them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's a reasonable assumption. I don't know if it actually makes the code simpler, since we are allowing for multiple components of the other types (protein and small mols), so we could use the same logic for all of them without having branches. That said, I do think it makes sense in the validator of the protocol to allow only one solvent component, since that's what we currently support.

all_openff_mols = list(
chain(all_alchemical_mols.values(), common_small_mols.values())
)
self._assign_openff_partial_charges(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't assign partial charges at this point, you'll end up with bad things happening - i.e. variable partial charges everywhere being assigned with the wrong partial charge method.

try:
# Minimize
openmm.LocalEnergyMinimizer.minimize(context)
# Optionally store minimized topology -- Mostly for debugging purposes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not always store it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in 2896683

# This can be populated however we want
return outputs

# TODO: Maybe this could be a utility function. Is this something protocol-specific?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly specific to this Protocol - also this should go in the Protocol validate method.

# print(f"Free energy = {fe_estimate} +/- {fe_error}") # DEBUG

@pytest.mark.skip(
reason="Ambertools failing to parameterize. Review when we have full nagl."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no reason why ambertools fails to generate charges for toluene.. this seems to be a bug not something that should wait for replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably due to some hiccups in ambertools and openmmforcefields at the time, I have to check again if this is the case. I remember at the time this was independent of our code base, and more related to some things not getting solved correctly in the environment. Something to revisit now

Copy link
Contributor Author

@ijpulidos ijpulidos Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors that I'm getting locally are partially as follows:

sh: symbol lookup error: sh: undefined symbol: rl_print_keybinding
/home/user/miniforge3/envs/feflow-dev/bin/wrapped_progs/antechamber: Fatal Error!
Cannot properly run "/home/user/miniforge3/envs/feflow-dev/bin/bondtype -j full -i ANTECHAMBER_BOND_TYPE.AC0 -o ANTECHAMBER_BOND_TYPE.AC -f ac".

and

ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name 'toluene' and SMILES '[H][c]1[c]([H])[c]([H])[c]([C]([H])([H])[H])[c]([H])[c]1[H]', 'partial_charge_method': 'am1bcc', 'use_conformers': [<Quantity([[ ...

It could point to something not getting resolved correctly in the environment (maybe due to badly specified dependencies elsewhere/upstream). But not entirely sure what's triggering this one, it just don't seem to be related to our code base as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like an issue with linking to the right readline library, we have some code paths that test this so I think it is more likely an issue with this env.

ValueError: No registered toolkits can provide the capability "assign_partial_charges"

openff will throw this error if there are errors when using ambertools -- do you get these errors locally and on CI?


# Generate and register FF parameters in the system generator template
all_openff_mols = [comp.to_openff() for comp in all_small_mols]
register_ff_parameters_template(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this and then using a different set of offmols elsewhere (with different partial charges) is a bad hack imho - because if either the templating behaviour changes or anything else ends up needing partial charges that doesn't go through templates you'll end up with a bunch of offmols downstream that have the wrrong thing.

I would strongly suggest generating the offmols & charge them ahead of them and keep them around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope here was that registering the parameters/template in the cache was enough when these were encountered again in the future. Instead of having to keep track of some new offmols around.

My concern was the same, if we keep the offmols around, there's a risk we change something in some of them and not the others (or change the offmol but not the component of origin) and we would also have undesired behavior.

I agree that either way can lead to this, but my idea was having to avoid having offmols around and always sourcing things from the original components from the input, as possible. Is there a reason to believe my implementation is not achieving that? I'm not using this all_openff_mols at all after this, maybe we could just use the list comprehension directly in the register_ff_parameters_template to avoid having them around?

If the component does not support the necessary conversion methods.
"""

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because technically protein components can try to go via openff, I wouldn't rely on an AttributeErrorr check here - I would isinstance check and enforce the correct route.

return topology


def get_positions_from_component(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium term, maybe we should just add to_positions methods in the Components themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable, but maybe we need to think about how would this work for the solvent component.

ValueError
If the atom index is not found in the topology.
"""
for residue in topology.residues():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than looping over residues and then over atoms, why not just loop over atoms, and get the residue? It would avoid you doing an extra for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the logic on this one and now I'm not using this utility function. More info at: d8ede01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards just removing this function if we are not using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor setup unit to work with protein mutation protocol

4 participants