Error when use fastmultigather against rocksdb (Error: No such file or directory (os error 2) - Tested with multiple cases) #381
Closed
Description
Hi @ctb ,
Currently I'm using these commands:
Prepare data
cd /mnt/data/tnmquann/benchmarking/12_experiment
# Step 1: sourmash manysketch
sourmash scripts manysketch manysketch.csv -o manysketch.zip -c 20 -p k=31,scaled=1000,abund
# Step 2: unzip the manysketch.zip (Notes: I used this folder for all the commands below)
unzip manysketch.zip -d manysketch
# Additional: index gtdb-rs207.genomic-reps.dna.k31.zip
cd /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207
sourmash scripts index gtdb-rs207.genomic-reps.dna.k31.zip -o gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 30
# Check indexed database
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
# Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Starting check
Finished check
...index is ok!
I tried many different solutions and got the following results
Solution 1 & 2: Work perfectly
cd /mnt/data/tnmquann/benchmarking/12_experiment
# Solution 1 - OK
sourmash scripts fastmultigather --cores 20 /mnt/data/tnmquann/benchmarking/12_experiment/manysketch/SOURMASH-MANIFEST.csv /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.zip
# Solution 2 - OK (use loop + parallel package to run this script for each sample)
# Recreate *.sig.zip for each samples, then use fastgather
sourmash scripts fastgather /mnt/data/tnmquann/benchmarking/12_experiment/zip/trimmed-SRR17380114.sig.zip /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.zip -c 20 -o trimmed-SRR17380114.csv
Both methods above do the job perfectly, except for solution 3 below (fastmultigather with rocksdb)
Solution 3: fastmultigather with rocksdb
Currently, the feature is only available when the database is indexed directly into the processing folder.
Solution 3.1 : Use the path to the indexed database
cd /mnt/data/tnmquann/benchmarking/12_experiment
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv
# Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against '/mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)
Solution 3.2: Copy indexed database into the processing folder and then run the commands
cd /mnt/data/tnmquann/benchmarking/12_experiment
cp -r /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb /mnt/data/tnmquann/benchmarking/12_experiment/manysketch
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against '/mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)
# Try to re-check the copied rocksdb
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Error: No such file or directory (os error 2)
Solution 3.3: Base on @ctb ‘s suggestion
cd /mnt/data/tnmquann/benchmarking/12_experiment/manysketch
# Symlink
ln -s /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb .
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against 'gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)
# Try to re-check the copied rocksdb
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Error: No such file or directory (os error 2)
Solution 3.4: Base on @bluegenes 's suggestion
cd /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207
cp gtdb-rs207.genomic-reps.dna.k31.zip /mnt/data/tnmquann/benchmarking/12_experiment/manysketch
# Index database
sourmash scripts index gtdb-rs207.genomic-reps.dna.k31.zip -o gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 30
# Check indexed database
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Starting check
Finished check
...index is ok!
# Re-run fastmultigather
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv
## Output is OK
I think there's a problem with the RocksDB folder configuration when running the index command.
Metadata
Assignees
Labels
No labels