Description
Description of the bug
I'm submitting a pgsc_calc job successfully using bfile (plink bed) however when I use the pfile format (plink pgen) it fails for unknown reasons (I cant interpret the error anyway...).
Input files are as follows
-rw-r----- 1 batzler bsi 23377581643 Dec 4 10:53 pg22.bed
-rw-r----- 1 batzler bsi 30106514 Dec 4 10:53 pg22.bim
-rw-r----- 1 batzler bsi 8074325 Dec 4 10:53 pg22.fam
-rw-r----- 1 batzler bsi 1396 Dec 4 10:53 pg22.log
-rw-r----- 1 batzler bsi 1294 Dec 4 13:34 pg_imputed22.log
-rw-r----- 1 batzler bsi 23982440039 Dec 4 13:34 pg_imputed22.pgen
-rw-r----- 1 batzler bsi 7259937 Dec 4 13:34 pg_imputed22.psam
-rw-r----- 1 batzler bsi 78753551 Dec 4 13:34 pg_imputed22.pvar
plink bed/binary files were created from the pgen files
$PLINK2 --pfile pg_imputed22 --make-bed --out pg$CHR
When I run through pipeline using format bfile everything executes properly.
When running with the pfile format and the pgen files
Error traceback is as follows
Traceback (most recent call last):
File "/app/pgscatalog.utils/.venv/bin/pgscatalog-match", line 8, in
sys.exit(run_match())
^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_match
ipc_path = get_match_candidates(
^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
with variants as target_df:
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in enter
self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].class)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_read
batches = reader.next_batches(batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/io/csv/batched_reader.py", line 134, in next_batches
batches = self._reader.next_batches(n)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: found more fields than defined in 'Schema'
Consider setting 'truncate_ragged_lines=True'.
Command used and terminal output
nextflow run pgscatalog/pgsc_calc -profile singularity --min_overlap 0.0001 --input ${samplesheet} --scorefile ${scorefile} --output ${outdir} -r ${pgsc_calc_version} -c ${project}/nxf_config.config --target_build ${target_build} --genotypes_cache $cachedir
Nextflow command is the same whether running format bfile or format pfile. Only thing I change is the samplesheet to represent the different path_prefix and format
Relevant files
No response
System information
Nextflow version
nextflow/23.04.2
slurm executor
apptainer/singularity
linux
Activity