We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It looks like, on the GI Slurm cluster, files are able to vanish from the cache.
I ran:
sbatch -c2 --mem 8G --partition long --time 7-00:00:00 --wrap "toil-wdl-runner https://raw.githubusercontent.com/vgteam/vg_wdl/9b3e4016b16d657a0a7c73e01e1b4c4410f5593e/workflows/giraffe.wdl ./inputs-training-HG002.m84005_220827_014912_s1.json --wdlOutputDirectory ./output/training/HG002.m84005_220827_014912_s1 --wdlOutputFile ./output/training/HG002.m84005_220827_014912_s1.json --logFile ./output/training/HG002.m84005_220827_014912_s1.log --writeLogs ./output/training/log-HG002.m84005_220827_014912_s1 --jobStore ./output/training/tree-HG002.m84005_220827_014912_s1 --batchSystem slurm --slurmTime 11:59:59 --disableProgress --caching=True"
Using the inputs file:
{ "Giraffe.INPUT_READ_FILE_1": "https://storage.googleapis.com/brain-genomics/awcarroll/share/ucsc/pacbio_fastq/HG002.m84005_220827_014912_s1.fastq.gz", "Giraffe.SAMPLE_NAME": "HG002.m84005_220827_014912_s1", "Giraffe.PAIRED_READS": false, "Giraffe.HAPLOTYPE_SAMPLING": false, "Giraffe.GBZ_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.gbz", "Giraffe.MIN_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.k31.w50.W.withzip.min", "Giraffe.ZIPCODES_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.k31.w50.W.zipcodes", "Giraffe.DIST_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/graphs/hprc-v1.1-mc-grch38.d9.dist", "Giraffe.VG_DOCKER": "quay.io/adamnovak/vg:beec239", "Giraffe.READS_PER_CHUNK": 150000, "Giraffe.GIRAFFE_PRESET": "hifi", "Giraffe.PRUNE_LOW_COMPLEXITY": true, "Giraffe.LEFTALIGN_BAM": true, "Giraffe.REALIGN_INDELS": false, "Giraffe.OUTPUT_SINGLE_BAM": true, "Giraffe.REFERENCE_PREFIX": "GRCh38#0#", "Giraffe.REFERENCE_FILE": "/private/groups/patenlab/anovak/projects/hprc/lr-giraffe/references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna", "Giraffe.CONTIGS": ["GRCh38#0#chr1", "GRCh38#0#chr2", "GRCh38#0#chr3", "GRCh38#0#chr4", "GRCh38#0#chr5", "GRCh38#0#chr6", "GRCh38#0#chr7", "GRCh38#0#chr8", "GRCh38#0#chr9", "GRCh38#0#chr10", "GRCh38#0#chr11", "GRCh38#0#chr12", "GRCh38#0#chr13", "GRCh38#0#chr14", "GRCh38#0#chr15", "GRCh38#0#chr16", "GRCh38#0#chr17", "GRCh38#0#chr18", "GRCh38#0#chr19", "GRCh38#0#chr20", "GRCh38#0#chr21", "GRCh38#0#chr22", "GRCh38#0#chrX", "GRCh38#0#chrY"] }
On Toil c8ba20fa7e95714966cbbfd002e46c26fcafcc05.
c8ba20fa7e95714966cbbfd002e46c26fcafcc05
I got errors like this in the log from some jobs:
Log from job "'WDLTaskJob' Giraffe.14.runVGGIRAFFEse.command kind-WDLTaskJob/instance-yeovm589 v6" follows: =========> [2024-09-10T11:46:36-0700] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2024-09-10T11:46:36-0700] [MainThread] [I] [toil] Running Toil version 6.2.0a1-8281a413cc028d4d448bbba9c344d42dcf55c2b8 on host phoenix-12.prism. [2024-09-10T11:46:36-0700] [MainThread] [I] [toil.worker] Working on job 'WDLTaskJob' Giraffe.14.runVGGIRAFFEse.command kind-WDLTaskJob/instance-yeovm589 v4 [2024-09-10T11:46:36-0700] [MainThread] [I] [toil.worker] Loaded body Job('WDLTaskJob' Giraffe.14.runVGGIRAFFEse.command kind-WDLTaskJob/instance-yeovm589 v4) from description 'WDLTaskJob' Giraffe.14.runVGGIRAFFEse.command kind-WDLTaskJob/instance-yeovm589 v4 [2024-09-10T11:46:36-0700] [MainThread] [I] [toil.wdl.wdltoil] Running task command for runVGGIRAFFE (['map', 'runVGGIRAFFE']) called as Giraffe.runVGGIRAFFEse [2024-09-10T11:46:36-0700] [MainThread] [I] [MiniWDLContainers] no configuration file found [2024-09-10T11:46:36-0700] [MainThread] [N] [MiniWDLContainers] Singularity runtime initialized (BETA) :: singularity_version: "singularity-ce version 3.10.3" [2024-09-10T11:46:36-0700] [MainThread] [I] [MiniWDLContainers] detected host resources :: cpu: 256, mem_bytes: 2151637909504 [2024-09-10T11:46:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2024-09-10T11:46:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-f72e592d1205456e987d787ad09cfc6a/hprc-v1.1-mc-grch38.d9.k31.w50.W.zipcodes' to path '/data/tmp/toilwf-f683e8bc4898542ab64ebee26f3926d8/c935/job/Giraffe/hprc-v1.1-mc-grch38.d9.k31.w50.W.zipcodes' [2024-09-10T11:46:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-71817319594f4aa99cbcd408d65a2690/hprc-v1.1-mc-grch38.d9.k31.w50.W.withzip.min' to path '/data/tmp/toilwf-f683e8bc4898542ab64ebee26f3926d8/c935/job/Giraffe/hprc-v1.1-mc-grch38.d9.k31.w50.W.withzip.min' [2024-09-10T11:46:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-fe9289b731b4460f8fcaa8423e41fdf4/hprc-v1.1-mc-grch38.d9.dist' to path '/data/tmp/toilwf-f683e8bc4898542ab64ebee26f3926d8/c935/job/Giraffe/hprc-v1.1-mc-grch38.d9.dist' [2024-09-10T11:46:36-0700] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-099bfda4486c46da94e4ec9202d82fb3/hprc-v1.1-mc-grch38.d9.gbz' to path '/data/tmp/toilwf-f683e8bc4898542ab64ebee26f3926d8/c935/job/Giraffe/hprc-v1.1-mc-grch38.d9.gbz' [2024-09-10T11:46:36-0700] [MainThread] [C] [toil.worker] Worker crashed with traceback: Traceback (most recent call last): File "/private/home/anovak/workspace/toil/src/toil/worker.py", line 439, in workerScript job._runner(jobGraph=None, jobStore=job_store, fileStore=fileStore, defer=defer) File "/private/home/anovak/workspace/toil/src/toil/job.py", line 3008, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/private/home/anovak/workspace/toil/src/toil/job.py", line 2919, in _run return self.run(fileStore) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 150, in decorated return decoratee(*args, **kwargs) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 2314, in run bindings = devirtualize_files(bindings, standard_library) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1419, in devirtualize_files return map_over_files_in_bindings(environment, stdlib._devirtualize_filename) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1645, in map_over_files_in_bindings return map_over_typed_files_in_bindings(bindings, lambda _, x: transform(x)) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1635, in map_over_typed_files_in_bindings return environment.map(lambda b: map_over_typed_files_in_binding(b, transform)) File "/private/home/anovak/workspace/toil/venv/lib/python3.10/site-packages/WDL/Env.py", line 151, in map fb = f(b) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1635, in <lambda> return environment.map(lambda b: map_over_typed_files_in_binding(b, transform)) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1654, in map_over_typed_files_in_binding return WDL.Env.Binding(binding.name, map_over_typed_files_in_value(binding.value, transform), binding.info) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1679, in map_over_typed_files_in_value new_path = transform(value.type, value.value) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 1645, in <lambda> return map_over_typed_files_in_bindings(bindings, lambda _, x: transform(x)) File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 880, in _devirtualize_filename result = self.devirtualize_to( File "/private/home/anovak/workspace/toil/src/toil/wdl/wdltoil.py", line 972, in devirtualize_to result = file_source.readGlobalFile(file_id, dest_path, mutable=False, symlink=True) File "/private/home/anovak/workspace/toil/src/toil/fileStores/cachingFileStore.py", line 1163, in readGlobalFile finalPath = self._readGlobalFileWithCache(fileStoreID, localFilePath, symlink, readerID) File "/private/home/anovak/workspace/toil/src/toil/fileStores/cachingFileStore.py", line 1611, in _readGlobalFileWithCache if self._createLinkFromCache(cachedPath, localFilePath, symlink): File "/private/home/anovak/workspace/toil/src/toil/fileStores/cachingFileStore.py", line 1507, in _createLinkFromCache assert os.path.exists(cachedPath), "Cannot create link to missing cache file %s" % cachedPath AssertionError: Cannot create link to missing cache file /data/tmp/toilwf-f683e8bc4898542ab64ebee26f3926d8/cache-2a428c6e-0fce-48b3-ac46-5ce532ae055a/tmp8i75v67v3c77a1488820623ed088bca6ae103da3bde63510 [2024-09-10T11:46:36-0700] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host phoenix-12.prism <=========
Something is wrong with the caching logic and files are apparently going missing from the cache while other jobs are trying to link to them.
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1640
The text was updated successfully, but these errors were encountered:
adamnovak
No branches or pull requests
It looks like, on the GI Slurm cluster, files are able to vanish from the cache.
I ran:
Using the inputs file:
On Toil
c8ba20fa7e95714966cbbfd002e46c26fcafcc05
.I got errors like this in the log from some jobs:
Something is wrong with the caching logic and files are apparently going missing from the cache while other jobs are trying to link to them.
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1640
The text was updated successfully, but these errors were encountered: