[DNM] Protein mutation support in Neq Cycling Setup Unit #106

ijpulidos · 2025-01-30T17:34:06Z

These changes aim to add support to be able to perform protein mutations in the current SetupUnit that the NonEquilibriumCyclingProtocol uses.

This mostly implies making sure that the alchemical component is now not assumed to be a SmallMoleculeComponent but also a ProteinComponent.

Utility functions for achieving such purpose were written or modified accordingly. Someof these changes were needed to be made on the openfe side of things, since we have not yet decided on migrating them to feflow (this code base).

for more information, see https://pre-commit.ci

codecov · 2025-09-24T23:41:12Z

Codecov Report

❌ Patch coverage is 91.89189% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (protein-mutation-protocol@00c1f15). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
feflow/utils/vendored.py	89.06%	7 Missing ⚠️
feflow/utils/misc.py	81.81%	6 Missing ⚠️
feflow/protocols/nonequilibrium_cycling.py	97.40%	2 Missing ⚠️

Additional details and impacted files

@@                     Coverage Diff                      @@
##             protein-mutation-protocol     #106   +/-   ##
============================================================
  Coverage                             ?   84.60%           
============================================================
  Files                                ?       16           
  Lines                                ?     1598           
  Branches                             ?        0           
============================================================
  Hits                                 ?     1352           
  Misses                               ?      246           
  Partials                             ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ijpulidos · 2025-09-24T23:44:05Z

@IAlibay this one should be ready for review. Please note that the vendored.py adds a lot of lines of code to the changes, but these are basically the same that we had on openfe only that I had to copy a good part of the topology_helpers.py module (see OpenFreeEnergy/openfe#1539 for more information)

This reverts commit 644486e

ijpulidos · 2025-10-24T05:48:45Z

@IAlibay I have found issues with some specific mappings. I think we need to postpone this one while I tackle these issues. I'm getting KeyError.

IAlibay

An initial quick review - I mostlly am not very convinced with the idea of doing the template registration / partial charge assignment in a separate place and then using offmols with possibly different partial charges everywhere else.

IAlibay · 2025-10-24T12:50:21Z

feflow/protocols/nonequilibrium_cycling.py

+        solvent_comp_a = (
+            solvent_comps.pop() if solvent_comps else None
+        )  # Get the first component if exists
+        protein_comps_a = get_typed_components(state_a, ProteinComponent)


Switch these to gufe.ChemicalSystem.get_components_of_type?

Ah yes, this is a good point. At the time this API point didn't exist. Thanks for the reminder.

Solved in a64135f

IAlibay · 2025-10-24T12:54:27Z

feflow/protocols/nonequilibrium_cycling.py

-        ligand_a = ligand_mapping.componentA
-        ligand_b = ligand_mapping.componentB
-
+        solvent_comps = get_typed_components(


Generally it would be better to just asset a single solvent component in the protocol validator than to have to work around these type of things.

i.e. currently there zero cases where having more than 1 SolventComponent could make sense, indeed even with mixed solvent systems, we're likely to just create a single solvencomponent that would contain a list of smiles rather than stacking them.

Yes, I think that's a reasonable assumption. I don't know if it actually makes the code simpler, since we are allowing for multiple components of the other types (protein and small mols), so we could use the same logic for all of them without having branches. That said, I do think it makes sense in the validator of the protocol to allow only one solvent component, since that's what we currently support.

IAlibay · 2025-10-24T12:57:47Z

feflow/protocols/nonequilibrium_cycling.py

-        all_openff_mols = list(
-            chain(all_alchemical_mols.values(), common_small_mols.values())
-        )
-        self._assign_openff_partial_charges(


If you don't assign partial charges at this point, you'll end up with bad things happening - i.e. variable partial charges everywhere being assigned with the wrong partial charge method.

IAlibay · 2025-10-24T13:09:05Z

feflow/protocols/nonequilibrium_cycling.py

        try:
            # Minimize
            openmm.LocalEnergyMinimizer.minimize(context)
+            # Optionally store minimized topology -- Mostly for debugging purposes


why not always store it?

Solved in 2896683

IAlibay · 2025-10-24T13:10:42Z

feflow/protocols/nonequilibrium_cycling.py

        # This can be populated however we want
        return outputs
+
+    # TODO: Maybe this could be a utility function. Is this something protocol-specific?


It's mostly specific to this Protocol - also this should go in the Protocol validate method.

IAlibay · 2025-10-24T13:14:52Z

feflow/tests/test_nonequilibrium_cycling.py

        # print(f"Free energy = {fe_estimate} +/- {fe_error}") # DEBUG

+    @pytest.mark.skip(
+        reason="Ambertools failing to parameterize. Review when we have full nagl."


There should be no reason why ambertools fails to generate charges for toluene.. this seems to be a bug not something that should wait for replacement.

This was probably due to some hiccups in ambertools and openmmforcefields at the time, I have to check again if this is the case. I remember at the time this was independent of our code base, and more related to some things not getting solved correctly in the environment. Something to revisit now

The errors that I'm getting locally are partially as follows:

sh: symbol lookup error: sh: undefined symbol: rl_print_keybinding /home/user/miniforge3/envs/feflow-dev/bin/wrapped_progs/antechamber: Fatal Error! Cannot properly run "/home/user/miniforge3/envs/feflow-dev/bin/bondtype -j full -i ANTECHAMBER_BOND_TYPE.AC0 -o ANTECHAMBER_BOND_TYPE.AC -f ac".

and

ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name 'toluene' and SMILES '[H][c]1[c]([H])[c]([H])[c]([C]([H])([H])[H])[c]([H])[c]1[H]', 'partial_charge_method': 'am1bcc', 'use_conformers': [<Quantity([[ ...

It could point to something not getting resolved correctly in the environment (maybe due to badly specified dependencies elsewhere/upstream). But not entirely sure what's triggering this one, it just don't seem to be related to our code base as far as I can tell.

Hmm, looks like an issue with linking to the right readline library, we have some code paths that test this so I think it is more likely an issue with this env.

ValueError: No registered toolkits can provide the capability "assign_partial_charges"

openff will throw this error if there are errors when using ambertools -- do you get these errors locally and on CI?

IAlibay · 2025-10-24T13:19:01Z

feflow/protocols/nonequilibrium_cycling.py

+
+        # Generate and register FF parameters in the system generator template
+        all_openff_mols = [comp.to_openff() for comp in all_small_mols]
+        register_ff_parameters_template(


Doing this and then using a different set of offmols elsewhere (with different partial charges) is a bad hack imho - because if either the templating behaviour changes or anything else ends up needing partial charges that doesn't go through templates you'll end up with a bunch of offmols downstream that have the wrrong thing.

I would strongly suggest generating the offmols & charge them ahead of them and keep them around.

My hope here was that registering the parameters/template in the cache was enough when these were encountered again in the future. Instead of having to keep track of some new offmols around.

My concern was the same, if we keep the offmols around, there's a risk we change something in some of them and not the others (or change the offmol but not the component of origin) and we would also have undesired behavior.

I agree that either way can lead to this, but my idea was having to avoid having offmols around and always sourcing things from the original components from the input, as possible. Is there a reason to believe my implementation is not achieving that? I'm not using this all_openff_mols at all after this, maybe we could just use the list comprehension directly in the register_ff_parameters_template to avoid having them around?

IAlibay · 2025-10-24T13:20:28Z

feflow/utils/misc.py

+        If the component does not support the necessary conversion methods.
+    """
+
+    try:


Because technically protein components can try to go via openff, I wouldn't rely on an AttributeErrorr check here - I would isinstance check and enforce the correct route.

IAlibay · 2025-10-24T13:22:34Z

feflow/utils/misc.py

+    return topology
+
+
+def get_positions_from_component(


Medium term, maybe we should just add to_positions methods in the Components themselves?

That sounds reasonable, but maybe we need to think about how would this work for the solvent component.

IAlibay · 2025-10-24T13:27:59Z

feflow/utils/misc.py

+    ValueError
+        If the atom index is not found in the topology.
+    """
+    for residue in topology.residues():


Rather than looping over residues and then over atoms, why not just loop over atoms, and get the residue? It would avoid you doing an extra for loop.

Fixed the logic on this one and now I'm not using this utility function. More info at: d8ede01

I'm leaning towards just removing this function if we are not using it.

…rom main

for more information, see https://pre-commit.ci

…ith mappings.

for more information, see https://pre-commit.ci

…ols comps

for more information, see https://pre-commit.ci

ijpulidos and others added 8 commits November 20, 2024 11:32

ADding TODO/Comments for things to refactor/review.

fb4b088

Removing unneeded todo

a6a1705

Fix custom exceptions in tests

24ccc77

Refined dependencies

086f246

Misc utility functions and tests

e997aea

Adding miscellaneous utility functions

8fd3877

Using new utility funcs from both feflow and openfe (branch)

ad4ef62

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b8ab83

for more information, see https://pre-commit.ci

ijpulidos marked this pull request as draft January 30, 2025 18:03

ijpulidos linked an issue Jan 30, 2025 that may be closed by this pull request

Refactor setup unit to work with protein mutation protocol #77

Open

ijpulidos added 5 commits January 30, 2025 13:34

Using protein mutation support openfe branch for now

08aedfe

remove conda-forge openfe dependency for now

9ad227a

Adding temporary dependencies

5134755

WIP -- fixing tests and TODOs

c35439e

Doesn't make sense to detect phase. Better name for unit.

1139eb2

jameseastwood assigned ijpulidos Mar 5, 2025

ijpulidos and others added 14 commits March 25, 2025 17:16

Add utility function to get residue indices in chain

9b2e508

Using mapped atoms instead of unique, since unique can be empty.

5784daa

Using updated arg name

1669449

Test gly to ala. Useful for removing atoms instead of adding.

e32bb67

[pre-commit.ci] auto fixes from pre-commit.com hooks

aa3ffc2

for more information, see https://pre-commit.ci

Adding threadpoolctl missing dep

f10415e

Using gufe from git. Temporarily.

9a8b99c

Adding needed dependency for gufe

a1106b2

Pinning lower lomap version

34b3a4d

directly getting solvent component

e360385

[pre-commit.ci] auto fixes from pre-commit.com hooks

330a3c3

for more information, see https://pre-commit.ci

Handle topologies with multiple chains by checking all atoms

cbd49f3

[pre-commit.ci] auto fixes from pre-commit.com hooks

afa6b95

for more information, see https://pre-commit.ci

Adding way to store minmized topology pdb. Minor fixes comments/docs.

994c2ca

ijpulidos added 3 commits September 24, 2025 18:39

making pre-commit happy. {} instead of dict()

6676349

Remving restriction on resids!

b95d0b0

Fixing missing pymbar change on boostrapping

f12c407

ijpulidos requested a review from IAlibay September 24, 2025 23:41

ijpulidos added 3 commits October 3, 2025 15:52

Serializing HTF right after its creation.

f98a0a5

Revert "Vendoring get_system_mappings from openfe"

89c1cfe

This reverts commit 644486e

We require openfe 1.6.1 (or newer).

04c0d1c

ijpulidos changed the title ~~Protein mutation support in Neq Cycling Setup Unit~~ [DNM] Protein mutation support in Neq Cycling Setup Unit Oct 24, 2025

IAlibay reviewed Oct 24, 2025

View reviewed changes

ijpulidos and others added 18 commits November 13, 2025 12:46

Always store minimized topology by default

2896683

Using gufe get typed components API

a64135f

We now depend on gufe and openfe 1.7.x

419db95

Using the validate method for checking consistency

e4f0c29

using gufe quantities instead of deprecated models -- cherry picked f…

70d1f81

…rom main

Type annotation for lambda function field

770e90a

adapting tests for new way of getting typed components

ae21582

Supporting only up to one solvent component. Using validate function.

3ed752c

[pre-commit.ci] auto fixes from pre-commit.com hooks

b9366a9

for more information, see https://pre-commit.ci

Setup tests for protein-ligand tyk2 transformation

c9344fc

Fixing logic using resids for chain instead of atom. Fixes problems w…

d8ede01

…ith mappings.

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a864f1

for more information, see https://pre-commit.ci

Use CPU platform in tests

1b85d0a

sorting parameters for unambiguity with pytest-xdist

fbfa750

Removing unnecessary utility function

1ce2be0

Warning about partial charges difference between template and small m…

5d2d062

…ols comps

Checking instances types instead. Helpful error msg.

a55e2f4

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1cd58b

for more information, see https://pre-commit.ci

[DNM] Protein mutation support in Neq Cycling Setup Unit #106

Are you sure you want to change the base?

[DNM] Protein mutation support in Neq Cycling Setup Unit #106

Uh oh!

Conversation

ijpulidos commented Jan 30, 2025

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ijpulidos commented Sep 24, 2025

Uh oh!

ijpulidos commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijpulidos Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Sep 24, 2025 •

edited

Loading

ijpulidos commented Oct 24, 2025 •

edited

Loading

ijpulidos Nov 13, 2025 •

edited

Loading