Skip to content

bfile works but pfile errors out #394

Open
@batzler

Description

@batzler

Description of the bug

I'm submitting a pgsc_calc job successfully using bfile (plink bed) however when I use the pfile format (plink pgen) it fails for unknown reasons (I cant interpret the error anyway...).

Input files are as follows

-rw-r----- 1 batzler bsi 23377581643 Dec 4 10:53 pg22.bed
-rw-r----- 1 batzler bsi 30106514 Dec 4 10:53 pg22.bim
-rw-r----- 1 batzler bsi 8074325 Dec 4 10:53 pg22.fam
-rw-r----- 1 batzler bsi 1396 Dec 4 10:53 pg22.log
-rw-r----- 1 batzler bsi 1294 Dec 4 13:34 pg_imputed22.log
-rw-r----- 1 batzler bsi 23982440039 Dec 4 13:34 pg_imputed22.pgen
-rw-r----- 1 batzler bsi 7259937 Dec 4 13:34 pg_imputed22.psam
-rw-r----- 1 batzler bsi 78753551 Dec 4 13:34 pg_imputed22.pvar

plink bed/binary files were created from the pgen files

$PLINK2 --pfile pg_imputed22 --make-bed --out pg$CHR

When I run through pipeline using format bfile everything executes properly.

When running with the pfile format and the pgen files
Error traceback is as follows

Traceback (most recent call last):
File "/app/pgscatalog.utils/.venv/bin/pgscatalog-match", line 8, in
sys.exit(run_match())
^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_match
ipc_path = get_match_candidates(
^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
with variants as target_df:
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in enter
self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].class)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_read
batches = reader.next_batches(batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/io/csv/batched_reader.py", line 134, in next_batches
batches = self._reader.next_batches(n)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: found more fields than defined in 'Schema'

Consider setting 'truncate_ragged_lines=True'.

Command used and terminal output

nextflow run pgscatalog/pgsc_calc -profile singularity --min_overlap 0.0001 --input ${samplesheet} --scorefile ${scorefile} --output ${outdir} -r ${pgsc_calc_version} -c ${project}/nxf_config.config --target_build ${target_build} --genotypes_cache $cachedir

Nextflow command is the same whether running format bfile or format pfile.  Only thing I change is the samplesheet to represent the different path_prefix and format

Relevant files

No response

System information

Nextflow version
nextflow/23.04.2

slurm executor
apptainer/singularity
linux

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions