Skip to content

Error when use fastmultigather against rocksdb (Error: No such file or directory (os error 2) - Tested with multiple cases) #381

Closed
@tnmquann

Description

Hi @ctb ,
Currently I'm using these commands:

Prepare data

cd /mnt/data/tnmquann/benchmarking/12_experiment
# Step 1: sourmash manysketch
sourmash scripts manysketch manysketch.csv -o manysketch.zip -c 20 -p k=31,scaled=1000,abund

# Step 2: unzip the manysketch.zip (Notes: I used this folder for all the commands below)
unzip manysketch.zip -d manysketch

# Additional: index gtdb-rs207.genomic-reps.dna.k31.zip
cd /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207
sourmash scripts index gtdb-rs207.genomic-reps.dna.k31.zip -o gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 30

# Check indexed database
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb

# Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Starting check
Finished check
...index is ok!

I tried many different solutions and got the following results

Solution 1 & 2: Work perfectly

cd /mnt/data/tnmquann/benchmarking/12_experiment
# Solution 1 - OK
sourmash scripts fastmultigather --cores 20 /mnt/data/tnmquann/benchmarking/12_experiment/manysketch/SOURMASH-MANIFEST.csv /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.zip
# Solution 2 - OK (use loop + parallel package to run this script for each sample)
# Recreate *.sig.zip for each samples, then use fastgather
sourmash scripts fastgather /mnt/data/tnmquann/benchmarking/12_experiment/zip/trimmed-SRR17380114.sig.zip /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.zip -c 20 -o trimmed-SRR17380114.csv

Both methods above do the job perfectly, except for solution 3 below (fastmultigather with rocksdb)

Solution 3: fastmultigather with rocksdb

Currently, the feature is only available when the database is indexed directly into the processing folder.

Solution 3.1 : Use the path to the indexed database

cd /mnt/data/tnmquann/benchmarking/12_experiment
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv

# Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against '/mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)

Solution 3.2: Copy indexed database into the processing folder and then run the commands

cd /mnt/data/tnmquann/benchmarking/12_experiment
cp -r /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb /mnt/data/tnmquann/benchmarking/12_experiment/manysketch
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv

## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against '/mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)

# Try to re-check the copied rocksdb
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Error: No such file or directory (os error 2)

Solution 3.3: Base on @ctb ‘s suggestion

cd /mnt/data/tnmquann/benchmarking/12_experiment/manysketch
# Symlink
ln -s /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207/gtdb-rs207.genomic-reps.dna.k31.rocksdb .

sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv

## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
=> sourmash_plugin_branchwater 0.9.5; cite Irber et al., doi: 10.1101/2022.11.02.514947
ksize: 31 / scaled: 1000 / moltype: DNA / threshold bp: 50000
gathering all sketches in 'SOURMASH-MANIFEST.csv' against 'gtdb-rs207.genomic-reps.dna.k31.rocksdb' using 20 threads
Error: No such file or directory (os error 2)

# Try to re-check the copied rocksdb
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Error: No such file or directory (os error 2)

Solution 3.4: Base on @bluegenes 's suggestion

cd /mnt/data/tnmquann/database/sourmash/GTDB_R07-RS207
cp gtdb-rs207.genomic-reps.dna.k31.zip /mnt/data/tnmquann/benchmarking/12_experiment/manysketch

# Index database
sourmash scripts index gtdb-rs207.genomic-reps.dna.k31.zip -o gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 30

# Check indexed database
sourmash scripts check gtdb-rs207.genomic-reps.dna.k31.rocksdb
## Output
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
checking index 'gtdb-rs207.genomic-reps.dna.k31.rocksdb'
Opening DB
Starting check
Finished check
...index is ok!

# Re-run fastmultigather
sourmash scripts fastmultigather SOURMASH-MANIFEST.csv gtdb-rs207.genomic-reps.dna.k31.rocksdb -c 20 -o gather.csv
## Output is OK

I think there's a problem with the RocksDB folder configuration when running the index command.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions