Skip to content

ligate won't treat offspring as scaffolded unless parents are still in data #108

Open
@kkellysci

Description

@kkellysci

I am phasing a large (n=172k) sample of parents and offspring (some duos, some trios), but I only need the phased genotypes for the offspring.

I start by running the phasing jobs (with the chromosomes in chunks) using shapeit5 with --pedigree on a HPC cluster. The initial whole-sample output files are written to a local scratch filesystem, where they will be deleted immediately after the job finishes. I then use bcftools view -S to subset these results to just the offspring, and save that smaller results chunk on the cluster's network filesystem where files will persist past the end of the job.

After all jobs have finished running, I try to use ligate with the --pedigree flag on the offspring-only results chunks. Despite using --pedigree, it detects the offspring samples as non-scaffolded, haplotype order gets swapped, and sometimes chunks from the maternal and paternal haplotypes are incorrectly combined as if they were in phase.

Is the behavior of ligate for a file where 100% of the samples are scaffolded just the same as bcftools concat -a -d all, or would there still be a reason to prefer ligate? If there's still a reason to prefer ligate, then is there a way to get it to treat offspring as scaffolded (eg. refrain from swapping haplotypes around) even when parents are no longer in the data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions