Cifs direct template structure #37

thanhnamitit · 2025-11-12T11:52:49Z

Summary

Add CIF direct template mode to OpenFold3, allowing users to provide template structures as CIF files without pre-computed alignments. The system automatically aligns template chains to query sequences and selects the best match based on sequence identity × coverage.

Changes

Core Implementation

Template Processing

Added template_cif_paths field to Chain model with validation to ensure mutual exclusivity with template_alignment_file_path
Added CifDirectParser class (openfold3/core/data/io/sequence/template.py) to parse CIF files directly
Auto-selects best matching chain from multi-chain CIF files using configurable score threshold (default: 0.1)
Updated TemplatePreprocessorInputInference (openfold3/core/data/pipelines/preprocessing/template.py) to support both alignment-based and CIF-direct modes
Implemented _parse_templates_from_cif_files() method for CIF-direct processing

Documentation

User Guides (docs/source/Inference.md, docs/source/template_how_to.md)

Added comprehensive CIF Direct Template Mode sections with usage examples, configuration options, and limitations
Clarified distinction between alignment-based and CIF-direct modes
Added JSON configuration examples for both homomer and multimer cases

Example Files

Query JSONs

query_homomer_with_direct_cif_templates.json - Homomer example
query_multimer_with_direct_cif_templates.json - Multimer example

Template CIFs (15 files total)

5 homomer templates: 1dgc.cif, 1ysa.cif, 1zta.cif, 4dmd.cif, 4dme.cif
10 multimer templates: 6l06.cif, 6l07.cif, 7cnw.cif, 7cnx.cif, 7cnz.cif (2 chain groups)

Related Issues

N/A

Testing

I've created a script to test the CIF direct template feature across three template modes: no templates, ColabFold MSA server templates, and CIF direct templates (user-provided). The script runs 6 end-to-end inference tests to compare prediction quality across these modes for both homomer and multimer queries.

Test Script

#!/bin/bash

set -e

echo "=========================================="
echo "Template E2E Tests"
echo "=========================================="
echo ""

echo "Test 1/6: Homomer WITHOUT templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_no_templates \
    --use_msa_server true \
    --use_templates false \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 2/6: Homomer WITH ColabFold templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_colabfold_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 3/6: Homomer WITH CIF direct templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_homomer_with_direct_cif_templates.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/homomer_cif_direct_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 4/6: Multimer WITHOUT templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_no_templates \
    --use_msa_server true \
    --use_templates false \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 5/6: Multimer WITH ColabFold templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_colabfold_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

echo ""
echo "Test 6/6: Multimer WITH CIF direct templates..."
docker run --rm --gpus all --shm-size=16g \
  -v /localhome/local-ntdo/openfold-3:/opt/openfold3 \
  -w /opt/openfold3 \
  openfold3:latest \
  python3 -m openfold3.run_openfold predict \
    --query_json examples/example_inference_inputs/query_multimer_with_direct_cif_templates.json \
    --inference_ckpt_path model/v19_78k_ft3_converted.pt \
    --runner_yaml examples/example_runner_yamls/low_mem.yml \
    --output_dir e2e_output/multimer_cif_direct_templates \
    --use_msa_server true \
    --use_templates true \
    --num_diffusion_samples 1 \
    --num_model_seeds 1

Test Output

➜  openfold-3 git:(main) ✗ ./run_e2e_cif_direct.sh
==========================================
Template E2E Tests
==========================================

Test 1/6: Homomer WITHOUT templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp5kmwvjc_'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 2/6: Homomer WITH ColabFold templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
/opt/openfold3/openfold3/core/data/pipelines/preprocessing/template.py:1755: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/1.log' mode='a' encoding='utf-8'>
  self._preprocess_templates_for_query(self.inputs[0])
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
Preprocessing templates...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp31o_3mhs'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 3/6: Homomer WITH CIF direct templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 1 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query leucine_zipper chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['A', 'B'] sequence='XRMKQLEDKVEELLSKNYHLENEVARLKKLVGER' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=None main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/805842a343863679f8df42bc2a1b2f2465b78e764dd75121d74a75d38f6a6c2c.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1dgc.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1ysa.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/1zta.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/4dmd.cif'), PosixPath('examples/example_inference_inputs/templates/honomer/chain_a_b/4dme.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/openfold3/openfold3/core/data/pipelines/preprocessing/template.py:1755: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/1.log' mode='a' encoding='utf-8'>
  self._preprocess_templates_for_query(self.inputs[0])
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
No complexes found for paired MSA generation. Skipping...
Preprocessing templates...
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:12<00:00,  0.08it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpf3oeb4mq'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 4/6: Multimer WITHOUT templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:13 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.27it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpu12v_0dp'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 5/6: Multimer WITH ColabFold templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:06 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs:   0%|          | 0/1 [00:00<?, ?it/s]Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.17it/s]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Preprocessing templates:   0%|          | 0/2 [00:00<?, ?it/s]/opt/conda/lib/python3.12/multiprocessing/pool.py:125: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/158.log' mode='a' encoding='utf-8'>
  result = (True, func(*args, **kwds))
Preprocessing templates: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpdmrlpvs8'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

Test 6/6: Multimer WITH CIF direct templates...

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

df: /root/.triton/autotune: No such file or directory
/opt/openfold3/openfold3/core/utils/checkpoint_loading_utils.py:49: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(ckpt_path)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
WARNING:openfold3.core.data.tools.colabfold_msa_server:Using output directory: /tmp/of3_colabfold_msas for ColabFold MSAs.
Submitting 2 sequences to the Colabfold MSA server for main MSAs...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:07 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Submitting 1 paired MSA queries to the Colabfold MSA server...
COMPLETE: 100%|██████████| 300/300 [elapsed: 00:00 remaining: 00:00]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:331: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior.
  tar_gz.extractall(path)
Computing paired MSAs: 100%|██████████| 1/1 [00:00<00:00,  2.23it/s]
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query 7cnx chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['A', 'C'] sequence='MLNSFKLSLQYILPKLWLTRLAGWGASKRAGWLTKLVIDLFVKYYKVDMKEAQKPDTASYRTFNEFFVRPLRDEVRPIDTDPNVLVMPADGVISQLGKIEEDKILQAKGHNYSLEALLAGNYLMADLFRNGTFVTTYLSPRDYHRVHMPCNGILREMIYVPGDLFSVNHLTAQNVPNLFARNERVICLFDTEFGPMAQILVGATIVGSIETVWAGTITPPREGIIKRWTWPAGENDGSVALLKGQEMGRFKLG' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/paired/0138d0b3fd92c9620cf73919594e0c25929395a60753ad614b373c48b9719a8d/08f507e5b2b7a3497db17dc6aa267e170d931b82702ed76a88ad857034fa1a04.npz')] main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/08f507e5b2b7a3497db17dc6aa267e170d931b82702ed76a88ad857034fa1a04.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/multimer/chain_a_c/7cnz.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/openfold3/openfold3/core/data/tools/colabfold_msa_server.py:1048: UserWarning: Query 7cnx chain molecule_type=<MoleculeType.PROTEIN: 0> chain_ids=['B', 'D'] sequence='XTVINLFAPGKVNLVEQLESLSVTKIGQPLAVSTGHHHHHHG' non_canonical_residues=None smiles=None ccd_codes=None paired_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/paired/0138d0b3fd92c9620cf73919594e0c25929395a60753ad614b373c48b9719a8d/1e0d5eeb7c5a6b94cf0d0c905b28909c6e7bc1e0f0445c07b762705f5be0ab25.npz')] main_msa_file_paths=[PosixPath('/tmp/of3_colabfold_msas/main/1e0d5eeb7c5a6b94cf0d0c905b28909c6e7bc1e0f0445c07b762705f5be0ab25.npz')] template_alignment_file_path=None template_entry_chain_ids=None template_cif_paths=[PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/6l06.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/6l07.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnw.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnx.cif'), PosixPath('examples/example_inference_inputs/templates/multimer/chain_b_d/7cnz.cif')] sdf_file_path=None already has template_cif_paths set. These are not overwritten with path(s) to the template CIF files from the ColabFold MSA server.
  inference_query_set = add_msa_paths_to_iqs(
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Preprocessing templates:   0%|          | 0/2 [00:00<?, ?it/s]/opt/conda/lib/python3.12/multiprocessing/pool.py:125: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/of3_template_data/template_logs/158.log' mode='a' encoding='utf-8'>
  result = (True, func(*args, **kwds))
Preprocessing templates: 100%|██████████| 2/2 [00:00<00:00,  4.52it/s]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
Predicting DataLoader 0:   0%|          | 0/1 [00:00<?, ?it/s]Seed set to 2746317213
Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]
==================================================
    PREDICTION SUMMARY (COMPLETE)    
==================================================
Total Queries Processed: 1
  - Successful Queries:  1
  - Failed Queries:      0
==================================================

Predicting DataLoader 0: 100%|██████████| 1/1 [00:29<00:00,  0.03it/s]/opt/conda/lib/python3.12/tempfile.py:940: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpiffxot7f'>
  _warnings.warn(warn_message, ResourceWarning)

Removing empty log directory...
Cleaning up MSA directories...

==========================================
All tests completed!
==========================================

Results Summary:

Homomer:
  - No templates:         e2e_output/homomer_no_templates/
  - ColabFold templates:  e2e_output/homomer_colabfold_templates/
  - CIF direct templates: e2e_output/homomer_cif_direct_templates/

Multimer:
  - No templates:         e2e_output/multimer_no_templates/
  - ColabFold templates:  e2e_output/multimer_colabfold_templates/
  - CIF direct templates: e2e_output/multimer_cif_direct_templates/

Compare confidence scores:

=== Homomer (Leucine Zipper) ===
No templates:         avg_plddt = 87.1446533203125
ColabFold templates:  avg_plddt = 87.14595794677734
CIF direct templates: avg_plddt = 87.38876342773438

=== Multimer (7cnx) ===
No templates:         avg_plddt = 79.51964569091797
ColabFold templates:  avg_plddt = 79.8408203125
CIF direct templates: avg_plddt = 79.5925521850586

Summary

Test Configuration:

Queries: Homomer (leucine zipper, 2 chains, 34 residues each) and Multimer (7cnx, 4 chains, ~300 total residues)
Settings: Low memory mode, 1 diffusion sample, 1 model seed
Template Modes:
1. No templates (--use_templates false)
2. ColabFold MSA server templates (--use_templates true with automatic template discovery)
3. CIF direct templates (user-provided CIF files with automatic alignment)

Results:

Query Type	No Templates	ColabFold Templates	CIF Direct Templates	Best Method
Homomer	87.14	87.15 (+0.01)	87.39 (+0.25)	CIF Direct
Multimer	79.52	79.84 (+0.32)	79.59 (+0.07)	ColabFold

Key Findings:

✅ All 6 tests completed successfully - CIF direct template feature works correctly
Templates improve predictions: All template modes showed improvements over no-template baseline
CIF direct templates excel for homomer: +0.25 avg_plddt improvement (best among all methods)
ColabFold templates excel for multimer: +0.32 avg_plddt improvement (best among all methods)
Both template methods are valuable: Users can choose based on whether they have specific template structures (CIF direct) or prefer automatic discovery (ColabFold)

Technical Validation:

CIF direct mode successfully processes raw CIF files without pre-computed alignments
Automatic template alignment and selection works as designed
User-provided templates integrate seamlessly with the existing pipeline
Template preprocessing logs available via template_preprocessor_settings.create_logs: true

Other Notes

N/A

jnwei · 2025-11-13T10:39:03Z

Hi @thanhnamitit , this looks really great! Thank you especially for providing such detailed examples complete with cifs, example queries, and documentation and initial results. It is greatly appreciated and will be super helpful for future users.

I will review this in greater depth over the next few days, but a few quick comments:

Do you mind testing the cif direct template structure parsing without the msa server? We have an issue where template alignments are being performed even when --use_templates=False is being passed from the command line. If we set both --use_msa_server false and --use_templates false we can ensure that there are no template alignments being generated from colabfold. You can merge past Support dummy sequence MSA #28 to get the latest support for MSA free predictions.
Do you mind submitting the examples a PR to HuggingFace instead? We'd like to keep the openfold-3 repo as lightweight as possible, so we place larger example directories
Could you please update also the documentation for Chain in InferenceQuerySet with the new fields for the template_cif_paths? We haven't configured the reference documentation to update automatically yet.

Thank you!

thanhnamitit · 2025-12-21T04:48:27Z

Hi @jnwei,

Thank you so much for reviewing the MR and for your kind words!

I've addressed all your comments:

1. MSA-Free Testing with CIF Direct Templates

I created a comprehensive E2E test script (run_e2e_cif_direct.sh) that runs 8 test cases covering all combinations:

Test	MSA Server	Templates	Target
1-2	❌ No MSA	❌ None / ✅ CIF-Direct	Homomer
3-4	❌ No MSA	❌ None / ✅ CIF-Direct	Multimer
5-6	✅ With MSA	❌ None / ✅ ColabFold	Homomer
7-8	✅ With MSA	✅ CIF-Direct	Homomer/Multimer

Bug Found & Fixed: During testing, I discovered an issue when running multimers in MSA-free mode. In msa.py, the extract_alignments_to_pair function was calling MsaArray.multi_concatenate with an empty list when no MSAs were available for pairing. I added a check to skip concatenation when there are no MSAs to pair:

if not msa_arrays_to_pair_i:
    continue

All 8 tests now pass successfully, You can take a look at the attached log file!

2. HuggingFace Examples PR

I've submitted the examples to HuggingFace: (https://huggingface.co/OpenFold/OpenFold3/discussions/12#694272355826ce8d56b0aaac)

3. Documentation Updates

Updated the following documentation with the new template_cif_paths and template_cif_chain_ids fields:

docs/source/input_format.md - Chain schema and field descriptions
docs/source/Inference.md - CIF Direct Template Mode section and Inference Query Set fields

Please review when you have a chance. Let me know if you need any changes!

e2e_test.log
run_e2e_cif_direct.sh

jnwei · 2025-12-23T09:20:45Z

Thank you so much @thanhnamitit ! Overall the changes look great. Thank you also for adding documentation and examples to the HuggingFace repository.

We'll review this further in the new year, and we should be able to add this in soon after.

thanhnamitit force-pushed the main branch 4 times, most recently from 5de99e9 to 253e5c8 Compare December 17, 2025 09:51

Cifs direct template structure

253e5c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cifs direct template structure #37

Cifs direct template structure #37

Uh oh!

thanhnamitit commented Nov 12, 2025

Uh oh!

jnwei commented Nov 13, 2025

Uh oh!

thanhnamitit commented Dec 21, 2025

Uh oh!

jnwei commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cifs direct template structure #37

Are you sure you want to change the base?

Cifs direct template structure #37

Uh oh!

Conversation

thanhnamitit commented Nov 12, 2025

Core Implementation

Documentation

Example Files

Test Script

Test Output

Summary

Uh oh!

jnwei commented Nov 13, 2025

Uh oh!

thanhnamitit commented Dec 21, 2025

1. MSA-Free Testing with CIF Direct Templates

2. HuggingFace Examples PR

3. Documentation Updates

Uh oh!

jnwei commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants