-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Description
First of all, congratulations to the Foldseek team for this excellent tool!
I am trying to run an exhaustive Foldseek search against AlphaFold DB / UniProt using a Singularity container, with GPU enabled and search in all target members ( --gpu 1 --cluster-search 1).
The database creation completes successfully, and searches work correctly without --cluster-search 1.
However, when I combine --gpu 1 and --cluster-search 1 for AFDB/Uniprot, Foldseek crashes with an error.
Environment
- Foldseek version: 24020d2 (built-in Foldseek in
foldseek.sif) - Execution environment: HPC cluster
- Container runtime: Singularity
- GPU: NVIDIA (CUDA available,
singularity exec --nvworks) - OS: Linux (HPC environment)
Database preparation steps
mkdir alphafold_uniprot_3
foldseek databases Alphafold/UniProt alphafold_uniprot_3/afdb_up tmpFolderThen I create the padded database for cluster search:
singularity exec -B /alphafold_uniprot_3:/afdb_up foldseek.sif \
foldseek makepaddedseqdb \
/afdb_up/afdb_up \
/afdb_up/afdb_up_pad \
--threads 32 \
--cluster-search 1Both the original AFDB database and the padded database are stored in the same directory:
alphafold_uniprot_3/
├── afdb_up
├── afdb_up_ca
├── afdb_up_ca.dbtype
├── afdb_up_ca.index
├── afdb_up.dbtype
├── afdb_up_h
├── afdb_up_h.dbtype
├── afdb_up_h.index
├── afdb_up.index
├── afdb_up.lookup
├── afdb_up_mapping
├── afdb_up_pad -> /afdb_up/afdb_up
├── afdb_up_pad_ca -> /afdb_up/afdb_up_ca
├── afdb_up_pad_ca.dbtype
├── afdb_up_pad_ca.index
├── afdb_up_pad.dbtype
├── afdb_up_pad_h -> /afdb_up/afdb_up_h
├── afdb_up_pad_h.dbtype
├── afdb_up_pad_h.index
├── afdb_up_pad.index
├── afdb_up_pad.lookup
├── afdb_up_pad_mapping
├── afdb_up_pad.sh
├── afdb_up_pad_ss
├── afdb_up_pad_ss.dbtype
├── afdb_up_pad_ss.gpu_mapping1
├── afdb_up_pad_ss.gpu_mapping2
├── afdb_up_pad_ss_h
├── afdb_up_pad_ss_h.dbtype
├── afdb_up_pad_ss_h.index
├── afdb_up_pad_ss.index
├── afdb_up_pad_ss.lookup
├── afdb_up_pad_taxonomy -> /afdb_up/afdb_up_taxonomy
├── afdb_up_ss
├── afdb_up_ss.dbtype
├── afdb_up_ss.index
├── afdb_up_taxonomy
├── afdb_up.version
├── get_database.sh
├── make_padded.log
├── make_padded.sh
└── tmpFolder
└── latest -> 18167349577761145409
Working command (CPU / no cluster search)
This command runs successfully:
singularity exec --nv \
-B /alphafold_uniprot_3:/afdb_up,/input:/input_query,/output_dir:/out_dir \
foldseek.sif \
foldseek easy-search \
/input_query/input_groove_aligned.pdb \
/afdb_up/afdb_up_pad \
/out_dir/aln_results \
/out_dir/tmpFolder \
--gpu 1 \
--cluster-search 0 \
--threads 64 \
--format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob"Failing command (GPU + cluster search)
When enabling both GPU and cluster search, Foldseek fails:
#!/bin/bash
#SBATCH --job-name=foldseek12
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
#SBATCH --partition=short-gpu-small
#SBATCH --mem-per-cpu=8GB
#SBATCH --gres=gpu:1g.5gb:1 # a MIG of 5GB from A100 40GB
singularity exec --nv \
-B /alphafold_uniprot_3:/afdb_up,/input:/input_query,/output_dir:/out_dir \
foldseek.sif \
foldseek easy-search \
/input_query/input_groove_aligned.pdb \
/afdb_up/afdb_up_pad \
/out_dir/aln_results \
/out_dir/tmpFolder \
--gpu 1 \
--cluster-search 1 \
--exhaustive-search 1 \
--threads 64 \
--format-output "query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob"Output before crash
MMseqs Version: 24020d257933c362dd1c22fd64cf478f89d5efc6
TMscore threshold 0
TMscore threshold mode 0
LDDT threshold 0
Sort by structure bit score 1
Alignment type 2
Exact TMscore 0
Substitution matrix aa:3di.out,nucl:3di.out
Add backtrace false
Alignment mode 3
Alignment mode 0
E-value threshold 10
Seq. id. threshold 0
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0
Coverage mode 0
Max sequence length 65535
Compositional bias 1
Compositional bias scale 1
Max reject 2147483647
Max accept 2147483647
Preload mode 0
Gap open cost aa:10,nucl:10
Gap extension cost aa:1,nucl:1
Threads 64
Compressed 0
Verbosity 3
Seed substitution matrix aa:3di.out,nucl:3di.out
Sensitivity 9.5
k-mer length 6
Target search mode 0
k-score seq:2147483647,prof:2147483647
Max results per query 1000
Split database 0
Split mode 2
Split memory limit 0
Diagonal scoring true
Exact k-mer matching 0
Mask residues 0
Mask residues probability 0.999995
Mask lower case residues 1
Mask lower letter repeating N times 6
Minimum diagonal score 30
Selected taxa
Spaced k-mers 1
Spaced k-mer pattern
Local temporary path
Use GPU 1
Use GPU server 0
Wait for GPU server 600
Prefilter mode 0
TMalign hit order 0
TMalign fast 1
MultiDomain Mode 1
Mask profile 1
Profile E-value threshold 0.1
Global sequence weighting false
Allow deletions false
Filter MSA 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.9
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Pseudo count mode 0
Profile output mode 0
Cluster search 1
Exhaustive search mode true
Search iterations 1
Remove temporary files true
Force restart with latest tmp false
MPI runner
Path to ProstT5
Chain name mode 0
Model name mode 0
Createdb extraction mode 0
Interface distance threshold 10
Write mapping file 0
Write Foldcomp 0
Mask b-factor threshold 0
Coord store mode 2
Write lookup file 1
Input format 0
File Inclusion Regex .*
File Exclusion Regex ^$
Alignment format 0
Format alignment output query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits,alntmscore,qtmscore,ttmscore,lddt,lddtfull,prob
Database output false
Report mode 2
Greedy best hits false
Alignment backtraces will be computed, since they were requested by output format.
createdb /input_query/input_groove_aligned.pdb /out_dir/tmpFolder/7544014676619848086/query --gpu 1 --chain-name-mode 0 --model-name-mode 0 --db-extraction-mode 0 --distance-threshold 10 --write-mapping 0 --write-foldcomp 0 --mask-bfactor-threshold 0 --coord-store-mode 2 --write-lookup 1 --input-format 0 --file-include '.*' --file-exclude '^$' --threads 64 -v 3
Output file: /out_dir/tmpFolder/7544014676619848086/query
[=================================================================] 100.00% 1 eta -
Time for merging to query_ss: 0h 0m 0s 133ms
Time for merging to query_h: 0h 0m 0s 136ms
Time for merging to query_ca: 0h 0m 0s 139ms
Time for merging to query: 0h 0m 0s 138ms
Ignore 0 out of 1.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 0s 944ms
Create directory /out_dir/tmpFolder/7544014676619848086/search_tmp
search /out_dir/tmpFolder/7544014676619848086/query /afdb_up/afdb_up_pad /out_dir/tmpFolder/7544014676619848086/result /out_dir/tmpFolder/7544014676619848086/search_tmp -a 1 --alignment-mode 3 --threads 64 -s 9.5 -k 6 --gpu 1 --cluster-search 1 --exhaustive-search 1 --remove-tmp-files 1
Error message
Require /afdb_up/afdb_up_pad_seq.dbtype database for cluster search.
Expected behavior
Foldseek should be able to run an exhaustive search (without clustered search) on AFDB/UniProt with GPU acceleration enabled.
Questions
- Is
--gpu 1 --cluster-search 1officially supported for AFDB/Uniprot? or is--cluster-search 1an alternative for clustered databases (CATH50, AFDB50, PDB100, etc)? - Are there additional steps required when preparing AFDB databases for GPU + cluster search?
- Is storing the padded and non-padded databases in the same directory expected to work?