Skip to content

Error when running easy-search with --gpu 1 --cluster-search 1 on AFDB/UniProt (Singularity) #558

@chagas98

Description

@chagas98

Description

First of all, congratulations to the Foldseek team for this excellent tool!

I am trying to run an exhaustive Foldseek search against AlphaFold DB / UniProt using a Singularity container, with GPU enabled and search in all target members ( --gpu 1 --cluster-search 1).

The database creation completes successfully, and searches work correctly without --cluster-search 1.
However, when I combine --gpu 1 and --cluster-search 1 for AFDB/Uniprot, Foldseek crashes with an error.


Environment

  • Foldseek version: 24020d2 (built-in Foldseek in foldseek.sif)
  • Execution environment: HPC cluster
  • Container runtime: Singularity
  • GPU: NVIDIA (CUDA available, singularity exec --nv works)
  • OS: Linux (HPC environment)

Database preparation steps

mkdir alphafold_uniprot_3 

foldseek databases Alphafold/UniProt alphafold_uniprot_3/afdb_up tmpFolder

Then I create the padded database for cluster search:

singularity exec -B /alphafold_uniprot_3:/afdb_up foldseek.sif \
  foldseek makepaddedseqdb \
    /afdb_up/afdb_up \
    /afdb_up/afdb_up_pad \
    --threads 32 \
    --cluster-search 1

Both the original AFDB database and the padded database are stored in the same directory:

alphafold_uniprot_3/
├── afdb_up
├── afdb_up_ca
├── afdb_up_ca.dbtype
├── afdb_up_ca.index
├── afdb_up.dbtype
├── afdb_up_h
├── afdb_up_h.dbtype
├── afdb_up_h.index
├── afdb_up.index
├── afdb_up.lookup
├── afdb_up_mapping
├── afdb_up_pad -> /afdb_up/afdb_up
├── afdb_up_pad_ca -> /afdb_up/afdb_up_ca
├── afdb_up_pad_ca.dbtype
├── afdb_up_pad_ca.index
├── afdb_up_pad.dbtype
├── afdb_up_pad_h -> /afdb_up/afdb_up_h
├── afdb_up_pad_h.dbtype
├── afdb_up_pad_h.index
├── afdb_up_pad.index
├── afdb_up_pad.lookup
├── afdb_up_pad_mapping
├── afdb_up_pad.sh
├── afdb_up_pad_ss
├── afdb_up_pad_ss.dbtype
├── afdb_up_pad_ss.gpu_mapping1
├── afdb_up_pad_ss.gpu_mapping2
├── afdb_up_pad_ss_h
├── afdb_up_pad_ss_h.dbtype
├── afdb_up_pad_ss_h.index
├── afdb_up_pad_ss.index
├── afdb_up_pad_ss.lookup
├── afdb_up_pad_taxonomy -> /afdb_up/afdb_up_taxonomy
├── afdb_up_ss
├── afdb_up_ss.dbtype
├── afdb_up_ss.index
├── afdb_up_taxonomy
├── afdb_up.version
├── get_database.sh
├── make_padded.log
├── make_padded.sh
└── tmpFolder
    └── latest -> 18167349577761145409

Working command (CPU / no cluster search)

This command runs successfully:

singularity exec --nv \
  -B /alphafold_uniprot_3:/afdb_up,/input:/input_query,/output_dir:/out_dir \
  foldseek.sif \
  foldseek easy-search \
    /input_query/input_groove_aligned.pdb \
    /afdb_up/afdb_up_pad \
    /out_dir/aln_results \
    /out_dir/tmpFolder \
    --gpu 1 \
    --cluster-search 0 \
    --threads 64 \
    --format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob"

Failing command (GPU + cluster search)

When enabling both GPU and cluster search, Foldseek fails:

#!/bin/bash
#SBATCH --job-name=foldseek12
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --partition=short-gpu-small
#SBATCH --mem-per-cpu=8GB
#SBATCH --gres=gpu:1g.5gb:1  # a MIG of 5GB from A100 40GB

singularity exec --nv \
  -B /alphafold_uniprot_3:/afdb_up,/input:/input_query,/output_dir:/out_dir \
  foldseek.sif \
  foldseek easy-search \
    /input_query/input_groove_aligned.pdb \
    /afdb_up/afdb_up_pad \
    /out_dir/aln_results \
    /out_dir/tmpFolder \
    --gpu 1 \
    --cluster-search 1 \
    --exhaustive-search 1 \
    --threads 64 \
    --format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob"

Output before crash

MMseqs Version:                    	24020d257933c362dd1c22fd64cf478f89d5efc6
TMscore threshold                  	0
TMscore threshold mode             	0
LDDT threshold                     	0
Sort by structure bit score        	1
Alignment type                     	2
Exact TMscore                      	0
Substitution matrix                	aa:3di.out,nucl:3di.out
Add backtrace                      	false
Alignment mode                     	3
Alignment mode                     	0
E-value threshold                  	10
Seq. id. threshold                 	0
Min alignment length               	0
Seq. id. mode                      	0
Alternative alignments             	0
Coverage threshold                 	0
Coverage mode                      	0
Max sequence length                	65535
Compositional bias                 	1
Compositional bias scale           	1
Max reject                         	2147483647
Max accept                         	2147483647
Preload mode                       	0
Gap open cost                      	aa:10,nucl:10
Gap extension cost                 	aa:1,nucl:1
Threads                            	64
Compressed                         	0
Verbosity                          	3
Seed substitution matrix           	aa:3di.out,nucl:3di.out
Sensitivity                        	9.5
k-mer length                       	6
Target search mode                 	0
k-score                            	seq:2147483647,prof:2147483647
Max results per query              	1000
Split database                     	0
Split mode                         	2
Split memory limit                 	0
Diagonal scoring                   	true
Exact k-mer matching               	0
Mask residues                      	0
Mask residues probability          	0.999995
Mask lower case residues           	1
Mask lower letter repeating N times	6
Minimum diagonal score             	30
Selected taxa                      	
Spaced k-mers                      	1
Spaced k-mer pattern               	
Local temporary path               	
Use GPU                            	1
Use GPU server                     	0
Wait for GPU server                	600
Prefilter mode                     	0
TMalign hit order                  	0
TMalign fast                       	1
MultiDomain Mode                   	1
Mask profile                       	1
Profile E-value threshold          	0.1
Global sequence weighting          	false
Allow deletions                    	false
Filter MSA                         	1
Use filter only at N seqs          	0
Maximum seq. id. threshold         	0.9
Minimum seq. id.                   	0.0
Minimum score per column           	-20
Minimum coverage                   	0
Select N most diverse seqs         	1000
Pseudo count mode                  	0
Profile output mode                	0
Cluster search                     	1
Exhaustive search mode             	true
Search iterations                  	1
Remove temporary files             	true
Force restart with latest tmp      	false
MPI runner                         	
Path to ProstT5                    	
Chain name mode                    	0
Model name mode                    	0
Createdb extraction mode           	0
Interface distance threshold       	10
Write mapping file                 	0
Write Foldcomp                     	0
Mask b-factor threshold            	0
Coord store mode                   	2
Write lookup file                  	1
Input format                       	0
File Inclusion Regex               	.*
File Exclusion Regex               	^$
Alignment format                   	0
Format alignment output            	query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob
Database output                    	false
Report mode                        	2
Greedy best hits                   	false

Alignment backtraces will be computed, since they were requested by output format.
createdb /input_query/input_groove_aligned.pdb /out_dir/tmpFolder/7544014676619848086/query --gpu 1 --chain-name-mode 0 --model-name-mode 0 --db-extraction-mode 0 --distance-threshold 10 --write-mapping 0 --write-foldcomp 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 64 -v 3 

Output file: /out_dir/tmpFolder/7544014676619848086/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 133ms
Time for merging to query_h: 0h 0m 0s 136ms
Time for merging to query_ca: 0h 0m 0s 139ms
Time for merging to query: 0h 0m 0s 138ms
Ignore 0 out of 1.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 0s 944ms
Create directory /out_dir/tmpFolder/7544014676619848086/search_tmp
search /out_dir/tmpFolder/7544014676619848086/query /afdb_up/afdb_up_pad /out_dir/tmpFolder/7544014676619848086/result /out_dir/tmpFolder/7544014676619848086/search_tmp -a 1 --alignment-mode 3 --threads 64 -s 9.5 -k 6 --gpu 1 --cluster-search 1 --exhaustive-search 1 --remove-tmp-files 1 


Error message

Require /afdb_up/afdb_up_pad_seq.dbtype database for cluster search.

Expected behavior

Foldseek should be able to run an exhaustive search (without clustered search) on AFDB/UniProt with GPU acceleration enabled.


Questions

  1. Is --gpu 1 --cluster-search 1 officially supported for AFDB/Uniprot? or is --cluster-search 1 an alternative for clustered databases (CATH50, AFDB50, PDB100, etc)?
  2. Are there additional steps required when preparing AFDB databases for GPU + cluster search?
  3. Is storing the padded and non-padded databases in the same directory expected to work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions