Skip to content

Conversation

@thanhnamitit
Copy link

Summary

Add CIF direct template mode to OpenFold3, allowing users to provide template structures as CIF files without pre-computed alignments. The system automatically aligns template chains to query sequences and selects the best match based on sequence identity × coverage.

Changes

Core Implementation

Template Processing

  • Added template_cif_paths field to Chain model with validation to ensure mutual exclusivity with template_alignment_file_path
  • Added CifDirectParser class (openfold3/core/data/io/sequence/template.py) to parse CIF files directly
  • Auto-selects best matching chain from multi-chain CIF files using configurable score threshold (default: 0.1)
  • Updated TemplatePreprocessorInputInference (openfold3/core/data/pipelines/preprocessing/template.py) to support both alignment-based and CIF-direct modes
  • Implemented _parse_templates_from_cif_files() method for CIF-direct processing

Documentation

User Guides (docs/source/Inference.md, docs/source/template_how_to.md)

  • Added comprehensive CIF Direct Template Mode sections with usage examples, configuration options, and limitations
  • Clarified distinction between alignment-based and CIF-direct modes
  • Added JSON configuration examples for both homomer and multimer cases

Example Files

Query JSONs

  • query_homomer_with_direct_cif_templates.json - Homomer example
  • query_multimer_with_direct_cif_templates.json - Multimer example

Template CIFs (15 files total)

  • 5 homomer templates: 1dgc.cif, 1ysa.cif, 1zta.cif, 4dmd.cif, 4dme.cif
  • 10 multimer templates: 6l06.cif, 6l07.cif, 7cnw.cif, 7cnx.cif, 7cnz.cif (2 chain groups)

Related Issues

N/A

Testing

I've created a script to test the CIF direct template feature across three template modes: no templates, ColabFold MSA server templates, and CIF direct templates (user-provided). The script runs 6 end-to-end inference tests to compare prediction quality across these modes for both homomer and multimer queries.

Test Script

#!/bin/bash

set -e

echo "=========================================="
echo "Template E2E Tests"
echo "=========================================="
echo ""

echo "Test 1/6: Homomer WITHOUT templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_no_templates \
    --use_msa_server true \
    --use_templates false \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 2/6: Homomer WITH ColabFold templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_colabfold_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 3/6: Homomer WITH CIF direct templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer_with_direct_cif_templates.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_cif_direct_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 4/6: Multimer WITHOUT templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_no_templates \
    --use_msa_server true \
    --use_templates false \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 5/6: Multimer WITH ColabFold templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_colabfold_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 6/6: Multimer WITH CIF direct templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer_with_direct_cif_templates.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_cif_direct_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

Test Output

➜  openfold-3 git:(main) ✗ ./run_e2e_cif_direct.sh
==========================================
Template E2E Tests
==========================================

Test 1/6: Homomer WITHOUT templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp5kmwvjc_'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 2/6: Homomer WITH ColabFold templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
/opt/openfold3/openfold3/core/data/pipelines/preprocessing/template.py:1755: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/1.log' mode='a' encoding='utf-8'>
  self._preprocess_templates_for_query(self.inputs[0])
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
Preprocessing templates...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp31o_3mhs'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 3/6: Homomer WITH CIF direct templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query leucine_zipper chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['A', 'B'] sequence='XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=None main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/805842a343863679f8df42bc2a1b2f2465b78e764dd75121d74a75d38f6a6c2c.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1dgc.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1ysa.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1zta.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/4dmd.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/4dme.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/openfold3/openfold3/core/data/pipelines/preprocessing/template.py:1755: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/1.log' mode='a' encoding='utf-8'>
  self._preprocess_templates_for_query(self.inputs[0])
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
Preprocessing templates...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpf3oeb4mq'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 4/6: Multimer WITHOUT templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:13 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.27it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpu12v_0dp'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 5/6: Multimer WITH ColabFold templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:06 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs:   0%|          | 0/1 [00:00<?, ?it/s]Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.17it/s]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Preprocessing templates:   0%|          | 0/2 [00:00<?, ?it/s]/opt/conda/lib/python3.12/multiprocessing/pool.py:125: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/158.log' mode='a' encoding='utf-8'>
  result = (True, func(*args, **kwds))
Preprocessing templates: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpdmrlpvs8'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 6/6: Multimer WITH CIF direct templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:07 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.23it/s]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query 7cnx chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['A', 'C'] sequence='MLNSFKLSLQYILPKLWLTRLAGWGASKRAGWLTKLVIDLFVKYYKVDMKEAQKPDTASYRTFNEFFVRPLRDEVRPIDTDPNVLVMPADGVISQLGKIEEDKILQAKGHNYSLEALLAGNYLMADLFRNGTFVTTYLSPRDYHRVHMPCNGILREMIYVPGDLFSVNHLTAQNVPNLFARNERVICLFDTEFGPMAQILVGATIVGSIETVWAGTITPPREGIIKRWTWPAGENDGSVALLKGQEMGRFKLG' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/paired/0138d0b3fd92c9620cf73919594e0c25929395a60753ad614b373c48b9719a8d/08f507e5b2b7a3497db17dc6aa267e170d931b82702ed76a88ad857034fa1a04.npz')] main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/08f507e5b2b7a3497db17dc6aa267e170d931b82702ed76a88ad857034fa1a04.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/multimer/chain_a_c/7cnz.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query 7cnx chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['B', 'D'] sequence='XTVINLFAPGKVNLVEQLESLSVTKIGQPLAVSTGHHHHHHG' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/paired/0138d0b3fd92c9620cf73919594e0c25929395a60753ad614b373c48b9719a8d/1e0d5eeb7c5a6b94cf0d0c905b28909c6e7bc1e0f0445c07b762705f5be0ab25.npz')] main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/1e0d5eeb7c5a6b94cf0d0c905b28909c6e7bc1e0f0445c07b762705f5be0ab25.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/6l06.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/6l07.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnw.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnx.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnz.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Preprocessing templates:   0%|          | 0/2 [00:00<?, ?it/s]/opt/conda/lib/python3.12/multiprocessing/pool.py:125: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/158.log' mode='a' encoding='utf-8'>
  result = (True, func(*args, **kwds))
Preprocessing templates: 100%|██████████| 2/2 [00:00<00:00,  4.52it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpiffxot7f'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

==========================================
All tests completed!
==========================================

Results Summary:

Homomer:
  - No templates:         e2e_output/homomer_no_templates/
  - ColabFold templates:  e2e_output/homomer_colabfold_templates/
  - CIF direct templates: e2e_output/homomer_cif_direct_templates/

Multimer:
  - No templates:         e2e_output/multimer_no_templates/
  - ColabFold templates:  e2e_output/multimer_colabfold_templates/
  - CIF direct templates: e2e_output/multimer_cif_direct_templates/

Compare confidence scores:

=== Homomer (Leucine Zipper) ===
No templates:         avg_plddt = 87.1446533203125
ColabFold templates:  avg_plddt = 87.14595794677734
CIF direct templates: avg_plddt = 87.38876342773438

=== Multimer (7cnx) ===
No templates:         avg_plddt = 79.51964569091797
ColabFold templates:  avg_plddt = 79.8408203125
CIF direct templates: avg_plddt = 79.5925521850586

Summary

Test Configuration:

  • Queries: Homomer (leucine zipper, 2 chains, 34 residues each) and Multimer (7cnx, 4 chains, ~300 total residues)
  • Settings: Low memory mode, 1 diffusion sample, 1 model seed
  • Template Modes:
    1. No templates (--use_templates false)
    2. ColabFold MSA server templates (--use_templates true with automatic template discovery)
    3. CIF direct templates (user-provided CIF files with automatic alignment)

Results:

Query Type No Templates ColabFold Templates CIF Direct Templates Best Method
Homomer 87.14 87.15 (+0.01) 87.39 (+0.25) CIF Direct
Multimer 79.52 79.84 (+0.32) 79.59 (+0.07) ColabFold

Key Findings:

  1. All 6 tests completed successfully - CIF direct template feature works correctly
  2. Templates improve predictions: All template modes showed improvements over no-template baseline
  3. CIF direct templates excel for homomer: +0.25 avg_plddt improvement (best among all methods)
  4. ColabFold templates excel for multimer: +0.32 avg_plddt improvement (best among all methods)
  5. Both template methods are valuable: Users can choose based on whether they have specific template structures (CIF direct) or prefer automatic discovery (ColabFold)

Technical Validation:

  • CIF direct mode successfully processes raw CIF files without pre-computed alignments
  • Automatic template alignment and selection works as designed
  • User-provided templates integrate seamlessly with the existing pipeline
  • Template preprocessing logs available via template_preprocessor_settings.create_logs: true

Other Notes

N/A

@jnwei
Copy link
Contributor

jnwei commented Nov 13, 2025

Hi @thanhnamitit , this looks really great! Thank you especially for providing such detailed examples complete with cifs, example queries, and documentation and initial results. It is greatly appreciated and will be super helpful for future users.

I will review this in greater depth over the next few days, but a few quick comments:

  • Do you mind testing the cif direct template structure parsing without the msa server? We have an issue where template alignments are being performed even when --use_templates=False is being passed from the command line. If we set both --use_msa_server false and --use_templates false we can ensure that there are no template alignments being generated from colabfold. You can merge past Support dummy sequence MSA #28 to get the latest support for MSA free predictions.

  • Do you mind submitting the examples a PR to HuggingFace instead? We'd like to keep the openfold-3 repo as lightweight as possible, so we place larger example directories

  • Could you please update also the documentation for Chain in InferenceQuerySet with the new fields for the template_cif_paths? We haven't configured the reference documentation to update automatically yet.

Thank you!

@thanhnamitit thanhnamitit force-pushed the main branch 4 times, most recently from 5de99e9 to 253e5c8 Compare December 17, 2025 09:51
@thanhnamitit
Copy link
Author

Hi @jnwei,

Thank you so much for reviewing the MR and for your kind words!

I've addressed all your comments:

1. MSA-Free Testing with CIF Direct Templates

I created a comprehensive E2E test script (run_e2e_cif_direct.sh) that runs 8 test cases covering all combinations:

Test MSA Server Templates Target
1-2 ❌ No MSA ❌ None / ✅ CIF-Direct Homomer
3-4 ❌ No MSA ❌ None / ✅ CIF-Direct Multimer
5-6 ✅ With MSA ❌ None / ✅ ColabFold Homomer
7-8 ✅ With MSA ✅ CIF-Direct Homomer/Multimer

Bug Found & Fixed: During testing, I discovered an issue when running multimers in MSA-free mode. In msa.py, the extract_alignments_to_pair function was calling MsaArray.multi_concatenate with an empty list when no MSAs were available for pairing. I added a check to skip concatenation when there are no MSAs to pair:

if not msa_arrays_to_pair_i:
    continue

All 8 tests now pass successfully, You can take a look at the attached log file!

2. HuggingFace Examples PR

I've submitted the examples to HuggingFace: (https://huggingface.co/OpenFold/OpenFold3/discussions/12#694272355826ce8d56b0aaac)

3. Documentation Updates

Updated the following documentation with the new template_cif_paths and template_cif_chain_ids fields:

  • docs/source/input_format.md - Chain schema and field descriptions
  • docs/source/Inference.md - CIF Direct Template Mode section and Inference Query Set fields

Please review when you have a chance. Let me know if you need any changes!

e2e_test.log
run_e2e_cif_direct.sh

@jnwei
Copy link
Contributor

jnwei commented Dec 23, 2025

Thank you so much @thanhnamitit ! Overall the changes look great. Thank you also for adding documentation and examples to the HuggingFace repository.

We'll review this further in the new year, and we should be able to add this in soon after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants